Removing human transcripts with polyA from RNA data

jbarlow · July 22, 2021, 11:21pm

Hello,

I have some metatranscriptome samples that have human contamination. I used kneaddata with the human transcriptome as the decontaminant database (–reference-db human_hg38_refMrna). After downstream processing with humann I found that a large % of the reads were unaligned so I looked at the first 30 reads or so and found the majority have a large stretch of polyA at the end of the sequence. The front half of these sequences blasts to human. Is there a good way to filter out these sequences with kneaddata?

sagunmaharjann · July 30, 2021, 2:08pm

Hi @jbarlow ,

Thank you for reaching out to bioBakery Lab. Can you confirm that you are using Kneaddata’s human_transcriptome reference database kneaddata_database --download human_transcriptome bowtie2 $DIR please?

You could also try adding the blast results to the database (in the .faa file then build the index) as contaminants and decoys to see if it improves the performance?

Regards,
Sagun

jbarlow · August 5, 2021, 5:35am

Hi @sagunmaharjann ,

Thanks for following up on this. I can definitely confirm I was using the human_transcriptome reference database from kneaddata. I ended up realizing I wasn’t doing adapter trimming correctly (needed to change the default to Truseq) and updated to the kneaddata 0.10 from pip instead of 0.7.4 from conda and then no longer had the issue. Not sure exactly what fixed the problem but all is good now!

Best,
Jacob

Topic		Replies	Views
Which reference DB should I use? KneadData	0	527	July 4, 2022
KneadData for dual-transcriptome RNA-seq data KneadData	1	656	June 3, 2021
Combined MTX/MGX analysis, contamination removal KneadData	0	60	May 28, 2024
HUManN3 functional annotation doubts HUMAnN	6	1336	June 29, 2022
Humann3 metatranscriptome analysis stuck at nucleotide alignment post processing HUMAnN	13	1364	March 7, 2023

Removing human transcripts with polyA from RNA data

Related topics