Higher number of reads after trimmed + contaminated step cf. raw reads?

rsango · May 26, 2020, 8:21pm

Hi, I just wanted to ask why, when I run kneaddata on my pair-end samples, there are more samples in

decontaminated human.index pair1 and decontaminated human.index pair2 than in

raw pair1 and raw pair2.

in addition, how come decontaminated human.index pair1 & decontaminated human.index pair2 do not have equal numbers of reads i.e. why is there a discrepancy between them after running Bowtie2 step to remove contaminants?

Many thanks!

Sample	raw pair1	raw pair2	trimmed pair1	trimmed pair2	trimmed orphan1	trimmed orphan2	decontaminated human.index pair1	decontaminated human.index pair2	decontaminated human.index orphan1	decontaminated human.index orphan2	final pair1	final pair2	final orphan1	final orphan2
ERRXXXX_1_kneaddata	15181542	15181542	12435116	12435116	1937860	309404	23688722	2711437	716247	8	23688722	2711437	716247	8

lauren.j.mciver · July 10, 2020, 8:26pm

Hi, Thank you for the detailed post and sorry for the confusion with the read counts. I think kneaddata is having an issue tracking the pairs of reads because of the format of the sequence identifier. It looks like the total number of raw reads and reads after decontamination are expected but I agree with you that the numbers for the pairs are unexpected. Would you check the first few lines of your input files and review the format of the sequence identifier? If it does not include the pair number then you would need to include that for kneaddata to track the pairs. If it does include the pair number then currently if you would change the format to one of the two expected for kneaddata it will resolve the issue you are seeing. We will also make a note to look at making kneaddata a bit more flexible in the future with sequence identifiers.

Two illumina formats

@EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG
@EAS139:136:FC706VJ:2:2104:15343:197393 2:Y:18:ATCACG

or (this is format flexible, just requiring the ids to end in 1 and 2)

@HWUSI-EAS100R:6:73:941:1973#0/1
@HWUSI-EAS100R:6:73:941:1973#0/2

Thank you,
Lauren

Topic		Replies	Views
Questions about the read count table pulled from kneaddata logs KneadData	1	557	February 8, 2023
Massive difference between paired reads' counts KneadData	1	631	May 1, 2021
Interpreting read_count_table Numbers? KneadData	0	341	February 9, 2023
[wmgx workflow] Discrepancy in total read counts between kneaddata and humann KneadData	3	449	December 15, 2022
Strange output for kneaddata KneadData	3	1246	October 17, 2022

Higher number of reads after trimmed + contaminated step cf. raw reads?

Related topics