Bowtie2 removed the entire reads from my fecal metagenomic data

Hello everyone,

I met a very serious problem about the contaminated reads removing using bowtie2 in the kneaddata pipeline (v0.12.0).

I want to perform quality control on my giant panda (Ailuropoda_melanoleuca) fecal metagenomic data. I have downloaded the reference genome (GCF_002007445.2) in NCBI and build the bowtie2 index.

#I downloaded the ref genome using aspera and unzipped it. It is located in ~/Desktop/kneaddata_tutorial/

#Then I build the bowtie index
bowtie2-build -f GCF_002007445.2_ASM200744v3_genomic.fna Ailuropoda_melanoleuca_db

#Finally, I run the kneaddata pipeline
kneaddata --input1 ①-11.9-4.R1.fq.gz --input2 ①-11.9-4.R2.fq.gz \
> --reference-db ~/Desktop/kneaddata_tutorial/Ailuropoda_melanoleuca_db \
> --output KneaddataOutputPairedEnd_2 \
> --threads 24 --sequencer-source none --quality-scores phred33

#check the read count of each stage
kneaddata_read_count_table --input KneaddataOutputPairedEnd_2/ --output KneaddataOutputPairedEnd_2.tsv

KneaddataOutputPairedEnd_2.tsv (460 Bytes)

The reads number of decontaminated file is unbelievably zero! I have no idea about this result. Does it mean that there aren’t any microbe reads in my data?

I found a similar post here. My problem is associated with the seq id.

And I checked the seq id and there’s space in my seq id.

zcat Sample_R1.fq.gz | head -4

@LH00169:221:22352TLT4:7:1101:51633:1042 1:N:0:ATGGTAAC+GTCTACCG
ANGGCTGAATAACCCGGATCTCGACGCTGCGGTTGGTGAAGATCTGGCACAGCAGCTACGTGACGAACTGGAACTGGTGAAAGGCGCGTCTAACGAGTTCGACAAAGAATTGTTCCTTGCGGGCGAAATCACTCCGGTATTCTTCGGTAC
+
I#II9IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

zcat Sample_R2.fq.gz | head -4

@LH00169:221:22352TLT4:7:1101:51633:1042 2:N:0:ATGGTAAC+GTCTACCG
ANATACGAAGCAGGTAAATTTGTCTTCGCTCGCTTCTACGGTACGGGTATCAGTCTGACGCGGCATCGGCGCAGGTGCCCACTCCACCAGGCCATCCAGCATATGATCGACGCCGAAGTTACCAAGCGCAGTACCGAAGAATACCGGAGT
+
I#IIIIIIIII-IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII-IIIII

It seems that there is a unofficial solution but I can’t understand the details.

Now I realized what he said.

Here is my solution.