I’m trying to figure out what the exact number of reads removed by kneaddata that was rRNA.
However, there’s a catch.
I need the actual reads that were removed by kneaddata.
I am using the bog-standard libraries that kneaddata comes with (Silva DB)
I see a few files, but the logs and labelling make it a bit confusing to understand the accounting.
Of the files that I have, I see there’s a <my_data>_<SILVA_plus a big tail>_unmatched_1_contam.fastq
Is this fastq a collection of reads that have had rRNA removed?
Or is this a fastq of rRNA reads?
Or is this a fastq of my reads before rRNA filtration has been applied?
I ask because I see reads from the contamination file inside the final result file: <my_data>_kneaddata_unmatched_1.fastq
It is unclear what my data should be.