Hi! I used Kneaddata 0.10.0 to perform quality control to our paired-end fastq reads, and I pulled the information from all the kneaddata logs to check the read count at each filtering steps for the samples using kneaddata_read_count_table command.
I saw in the read count table, there are read count for:
raw pair1, raw pair2
trimmed pair1, trimmed pair2
trimmed orphan1, trimmed orphan2
decontaminated mouse_C57BL_6NJ pair1, decontaminated mouse_C57BL_6NJ pair2
decontaminated mouse_C57BL_6NJ orphan1, decontaminated mouse_C57BL_6NJ orphan2
final pair1, final pair2
final orphan1, final orphan2
What’s the difference between the pair and orphan? Should I look at the columns final pair1 and final pair2 for the final cleaned fastqs?
I also have a question about the --cat-final-output
option.
Should we expect exactly same reads in the concatenated fastqs to its pair1 plus pair2? How the paired-end reads are joined to generate the concatenated final output? For example, If final pair1 = 100001 reads, and pair 2 = 100001, should the concatenated fastq be 200002 reads?
I’m seeing in my data that the concatenated fastqs have a little more reads than the sum of pair1&2. Were some additional sequences been used to join the reads?
Any input would be much appreciated!
Thank you so much!!
Best,
Fangxi