Strange output for kneaddata

wen · October 29, 2020, 3:52am

Dear,
I got some strange results after I run Kneaddata workflow. This sample(HFD8_1) is paired-end metagenomic sequencing data from a mouse. However, I got 0 read in the final fastq. Interestingly, the results of the other samples are normal, except for this sample. It’s very confusing to me and how should I understand the following information?
Dose it mean that all sequences in HFD8 are contaminated sequences?

Total contaminate sequences in file ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_contam_1.fastq ) : 0.0
Total contaminate sequences in file ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_contam_2.fastq ) : 0.0
Total contaminate sequences in file ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_1_contam.fastq ) : 63815882.0
Total contaminate sequences in file ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_2_contam.fastq ) : 0.0
Total reads after removing those found in reference database ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_clean_1.fastq ): 0.0
Total reads after removing those found in reference database ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_clean_2.fastq ): 0.0
Total reads after merging results from multiple databases ( HFD8_1_kneaddata_paired_1.fastq ): 0.0
Total reads after merging results from multiple databases ( HFD8_1_kneaddata_paired_2.fastq ): 0.0
Total reads after removing those found in reference database ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_1_clean.fastq ): 55206472.0
Total reads after merging results from multiple databases ( HFD8_1_kneaddata_unmatched_1.fastq ): 55206472.0
Total reads after removing those found in reference database ( HFD8_1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_2_clean.fastq ): 0.0
Total reads after merging results from multiple databases ( HFD8_1_kneaddata_unmatched_2.fastq ): 0.0

Thank you,
Wen

lauren.j.mciver · October 29, 2020, 9:44pm

Hello - Thank you for the detailed post. It looks like kneaddata is unable to track the read pairs. It appears that all of the final reads are included as orphans in one file. If you would double check that your sequence identifiers include a “/1” and “/2” to indicate the read pair. If not included, if you would add this it should resolve the issue that you are seeing.

Thank you,
Lauren

Kirby · October 16, 2022, 4:34pm

Hello @lauren.j.mciver, I am having the same issue.

My paired-sequence identifiers are
@A00930:290:HC5FGDSX3:3:1101:21685:1000 1:N:0:TGCGGCGT+TACCGAGG and
@A00930:290:HC5FGDSX3:3:1101:21685:1000 2:N:0:TGCGGCGT+TACCGAGG.

There exist a blank between “1000” and “1”.
I think I must remove this blank, but I worry it just read like “10001” instead “1000” and “1”.

How do I solve it?
Could correcting “1” to “/1” or adding “/1” to end of identifier after remove the blank be the solution?

My kneaddata version is v0.12.0

I am referencing this query.

Thank you,
Kirby

Kirby · October 17, 2022, 9:20am

I clear this issue.

I use a command
sed 's/ 1:N:0:TGCGGCGT+TACCGAGG/\/1/g' < sample.R1.fastq > new.R1.fastq

The problem was that Illumina’s read pair indicator was seperated by a blank within the seq identifier line.

Thank you!!

Topic		Replies	Views
Bowtie2 removed the entire reads from my fecal metagenomic data KneadData	2	65	August 10, 2024
Kneaddata removed rRNA reads accounting KneadData	2	320	March 11, 2022
Paired End Run output explanation KneadData	1	50	August 22, 2024
Paired-end data results in unpaired output KneadData	27	5819	June 20, 2024
Strange output from paired end kneaddata input KneadData	2	2165	August 28, 2020

Strange output for kneaddata

Related topics