Could someone please clarify something from the kneaddata log file? I’m using version 0.12.0 and I’m confused (and alarmed) at the number of reads/sequences removed from my metagenomic files by TRF.
Input 1&2 read count: 70,179,888
Trimmed paid 1&2 read count: 65,368,820 (93% of input reads)
Repeats removed 1 sequence count: 305,115 (0.43% of input reads)
Repeats removed 2 sequence count: 308,169 (0.44% of input reads)
Despite this seemly huge drop in the number of reads/sequences, the file sizes of the trimmed.fq’s are only slightly bigger than the repeats.removed.fq’s.
So my questions are: is the count of sequences given from TRF the number of reads it has removed? Is there a big difference in meaning between reads and sequences? Does TRF only remove short reads, hence the file sizes remain similar despite the huge number of reads removed?