The bioBakery help forum

Strange output for kneaddata

Dear,
I got some strange results after I run Kneaddata workflow. This sample(HFD8_1) is paired-end metagenomic sequencing data from a mouse. However, I got 0 read in the final fastq. Interestingly, the results of the other samples are normal, except for this sample. It’s very confusing to me and how should I understand the following information?
Dose it mean that all sequences in HFD8 are contaminated sequences?

Total contaminate sequences in file ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_contam_1.fastq ) : 0.0
Total contaminate sequences in file ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_contam_2.fastq ) : 0.0
Total contaminate sequences in file ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_1_contam.fastq ) : 63815882.0
Total contaminate sequences in file ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_2_contam.fastq ) : 0.0
Total reads after removing those found in reference database ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_clean_1.fastq ): 0.0
Total reads after removing those found in reference database ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_clean_2.fastq ): 0.0
Total reads after merging results from multiple databases ( HFD8_1_kneaddata_paired_1.fastq ): 0.0
Total reads after merging results from multiple databases ( HFD8_1_kneaddata_paired_2.fastq ): 0.0
Total reads after removing those found in reference database ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_1_clean.fastq ): 55206472.0
Total reads after merging results from multiple databases ( HFD8_1_kneaddata_unmatched_1.fastq ): 55206472.0
Total reads after removing those found in reference database ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_2_clean.fastq ): 0.0
Total reads after merging results from multiple databases ( HFD8_1_kneaddata_unmatched_2.fastq ): 0.0

Thank you,
Wen

Hello - Thank you for the detailed post. It looks like kneaddata is unable to track the read pairs. It appears that all of the final reads are included as orphans in one file. If you would double check that your sequence identifiers include a “/1” and “/2” to indicate the read pair. If not included, if you would add this it should resolve the issue that you are seeing.

Thank you,
Lauren