Strange output for kneaddata

Dear,
I got some strange results after I run Kneaddata workflow. This sample(HFD8_1) is paired-end metagenomic sequencing data from a mouse. However, I got 0 read in the final fastq. Interestingly, the results of the other samples are normal, except for this sample. It’s very confusing to me and how should I understand the following information?
Dose it mean that all sequences in HFD8 are contaminated sequences?

Total contaminate sequences in file ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_contam_1.fastq ) : 0.0
Total contaminate sequences in file ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_contam_2.fastq ) : 0.0
Total contaminate sequences in file ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_1_contam.fastq ) : 63815882.0
Total contaminate sequences in file ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_2_contam.fastq ) : 0.0
Total reads after removing those found in reference database ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_clean_1.fastq ): 0.0
Total reads after removing those found in reference database ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_clean_2.fastq ): 0.0
Total reads after merging results from multiple databases ( HFD8_1_kneaddata_paired_1.fastq ): 0.0
Total reads after merging results from multiple databases ( HFD8_1_kneaddata_paired_2.fastq ): 0.0
Total reads after removing those found in reference database ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_1_clean.fastq ): 55206472.0
Total reads after merging results from multiple databases ( HFD8_1_kneaddata_unmatched_1.fastq ): 55206472.0
Total reads after removing those found in reference database ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_2_clean.fastq ): 0.0
Total reads after merging results from multiple databases ( HFD8_1_kneaddata_unmatched_2.fastq ): 0.0

Thank you,
Wen

Hello - Thank you for the detailed post. It looks like kneaddata is unable to track the read pairs. It appears that all of the final reads are included as orphans in one file. If you would double check that your sequence identifiers include a “/1” and “/2” to indicate the read pair. If not included, if you would add this it should resolve the issue that you are seeing.

Thank you,
Lauren

Hello @lauren.j.mciver, I am having the same issue.

My paired-sequence identifiers are
@A00930:290:HC5FGDSX3:3:1101:21685:1000 1:N:0:TGCGGCGT+TACCGAGG and
@A00930:290:HC5FGDSX3:3:1101:21685:1000 2:N:0:TGCGGCGT+TACCGAGG.

There exist a blank between “1000” and “1”.
I think I must remove this blank, but I worry it just read like “10001” instead “1000” and “1”.

How do I solve it?
Could correcting “1” to “/1” or adding “/1” to end of identifier after remove the blank be the solution?

My kneaddata version is v0.12.0

I am referencing this query.

Thank you,
Kirby

I clear this issue.

I use a command
sed 's/ 1:N:0:TGCGGCGT+TACCGAGG/\/1/g' < sample.R1.fastq > new.R1.fastq

The problem was that Illumina’s read pair indicator was seperated by a blank within the seq identifier line.

Thank you!! :smile: