Questions about the read count table pulled from kneaddata logs

Fangxi_Xu · January 18, 2023, 5:13pm

Hi! I used Kneaddata 0.10.0 to perform quality control to our paired-end fastq reads, and I pulled the information from all the kneaddata logs to check the read count at each filtering steps for the samples using kneaddata_read_count_table command.
I saw in the read count table, there are read count for:
raw pair1, raw pair2
trimmed pair1, trimmed pair2
trimmed orphan1, trimmed orphan2
decontaminated mouse_C57BL_6NJ pair1, decontaminated mouse_C57BL_6NJ pair2
decontaminated mouse_C57BL_6NJ orphan1, decontaminated mouse_C57BL_6NJ orphan2
final pair1, final pair2
final orphan1, final orphan2

What’s the difference between the pair and orphan? Should I look at the columns final pair1 and final pair2 for the final cleaned fastqs?

I also have a question about the --cat-final-output option.

Should we expect exactly same reads in the concatenated fastqs to its pair1 plus pair2? How the paired-end reads are joined to generate the concatenated final output? For example, If final pair1 = 100001 reads, and pair 2 = 100001, should the concatenated fastq be 200002 reads?
I’m seeing in my data that the concatenated fastqs have a little more reads than the sum of pair1&2. Were some additional sequences been used to join the reads?
Any input would be much appreciated!
Thank you so much!!

Best,
Fangxi

Fangxi_Xu · February 8, 2023, 4:49pm

Hi,
I think I figured this out so I will just reply to my own post.

The total# of reads in the concatenated paired end final fastq is:
the reads in final pair 1 + the reads in final pair 2 + unmatched 1 + unmatched 2

The final pair 1&2 are the reads passed Situation1 as described in Kneaddata tutorial:

Both reads in the pair pass.

We should cat together the final outputs for further analysis, especially in HUMANN which takes a single input file.

Thank you!

Best,
Fangxi

Topic		Replies	Views
Interpreting read_count_table Numbers? KneadData	0	340	February 9, 2023
Higher number of reads after trimmed + contaminated step cf. raw reads? KneadData	1	554	July 10, 2020
Strange output from paired end kneaddata input KneadData	2	2176	August 28, 2020
Massive difference between paired reads' counts KneadData	1	631	May 1, 2021
Paired-end data results in unpaired output KneadData	27	5826	June 20, 2024

Questions about the read count table pulled from kneaddata logs

Related topics