Kneaddata removed rRNA reads accounting

Billy_Law · January 18, 2022, 6:24pm

Hi,
I’m trying to figure out what the exact number of reads removed by kneaddata that was rRNA.
However, there’s a catch.
I need the actual reads that were removed by kneaddata.

I am using the bog-standard libraries that kneaddata comes with (Silva DB)
I see a few files, but the logs and labelling make it a bit confusing to understand the accounting.

Of the files that I have, I see there’s a <my_data>_<SILVA_plus a big tail>_unmatched_1_contam.fastq
Is this fastq a collection of reads that have had rRNA removed?
Or is this a fastq of rRNA reads?
Or is this a fastq of my reads before rRNA filtration has been applied?
Thanks!

I ask because I see reads from the contamination file inside the final result file: <my_data>_kneaddata_unmatched_1.fastq

It is unclear what my data should be.

Billy_Law · January 20, 2022, 10:48pm

Hi,
I wanted to ask: Is there any supporting documentation on how kneaddata cleans the data?
I see in paired-read data, there are many rRNA sequences found inside the final output file.
(This was ascertained by comparing sequences in the contamination file and the final output)

I can verify that kneaddata works as-advertised (no rRNA seqs found) in single-end data.
What is kneaddata doing with paired-end data?
Should I be only using forward-end reads in a paired-data scenario?

Billy_Law · March 11, 2022, 10:19pm

does anyone know?? I don’t know what is wrong here.

Topic		Replies	Views
Strange output for kneaddata KneadData	3	1263	October 17, 2022
No reads in the "Final output files created" KneadData	2	991	April 14, 2022
Lost majority of the reads after kneadata KneadData	1	743	May 24, 2023
There are less reads survived after kneaddata KneadData	5	1135	January 23, 2023
Higher number of reads after trimmed + contaminated step cf. raw reads? KneadData	1	562	July 10, 2020

Kneaddata removed rRNA reads accounting

Related topics