Lost majority of the reads after kneadata

Hi :slight_smile:
When I use Kneaddata (v0.12.0) to trim sequences files and separate the reads from host(mouse), very few reads are left after the whole pipeline. I am using the default parameters.

Command:
kneaddata --input1 sample1.fq --input2 sample2.fq -db GRCm39_mice_db --output /output/ -t 20 --run-fastqc-start --run-fastqc-end --trimmomatic /home/software/trimmomatic/Trimmomatic-0.39 --sequencer-source none

one example of the log file:

11/30/2022 05:52:16 PM - kneaddata.utilities - INFO: Running bowtie2 ... 
11/30/2022 05:52:16 PM - kneaddata.utilities - INFO: Execute command: kneaddata_bowtie2_discordant_pairs --bowtie2 /home/lzhang/miniconda3/envs/mice_project/bin/bowtie2 --threads 20 -x /sbidata/projects/lzhang/2022_mice/Analysis/kneaddata_database/GRCm39/GRCm39_mice_db --mode strict --bowtie2-options "--very-sensitive-local --phred33" -1 /sbidata/projects/lzhang/2022_mice/Data/rawData_part2/processed/D14_4F8_1.new_kneaddata.repeats.removed.1.fastq -2 /sbidata/projects/lzhang/2022_mice/Data/rawData_part2/processed/D14_4F8_1.new_kneaddata.repeats.removed.2.fastq --un-pair /sbidata/projects/lzhang/2022_mice/Data/rawData_part2/processed/D14_4F8_1.new_kneaddata_GRCm39_mice_db_bowtie2_paired_clean_%.fastq --al-pair /sbidata/projects/lzhang/2022_mice/Data/rawData_part2/processed/D14_4F8_1.new_kneaddata_GRCm39_mice_db_bowtie2_paired_contam_%.fastq -U /sbidata/projects/lzhang/2022_mice/Data/rawData_part2/processed/D14_4F8_1.new_kneaddata.repeats.removed.unmatched.1.fastq,/sbidata/projects/lzhang/2022_mice/Data/rawData_part2/processed/D14_4F8_1.new_kneaddata.repeats.removed.unmatched.2.fastq --un-single /sbidata/projects/lzhang/2022_mice/Data/rawData_part2/processed/D14_4F8_1.new_kneaddata_GRCm39_mice_db_bowtie2_unmatched_%_clean.fastq --al-single /sbidata/projects/lzhang/2022_mice/Data/rawData_part2/processed/D14_4F8_1.new_kneaddata_GRCm39_mice_db_bowtie2_unmatched_%_contam.fastq -S /dev/null
11/30/2022 06:01:17 PM - kneaddata.utilities - DEBUG: b'6279552 reads; of these:\n  6279552 (100.00%) were unpaired; of these:\n    6541 (0.10%) aligned 0 times\n    930147 (14.81%) aligned exactly 1 time\n    5342864 (85.08%) aligned >1 times\n99.90% overall alignment rate\npair1_aligned : 2691082\npair2_aligned : 2691082\npair1_unaligned : 2140\npair2_unaligned : 2140\norphan1_aligned : 434424\norphan2_aligned : 457777\norphan1_unaligned : 406\norphan2_unaligned : 501\n'
11/30/2022 06:01:17 PM - kneaddata.utilities - DEBUG: Checking output file from bowtie2 : /sbidata/projects/lzhang/2022_mice/Data/rawData_part2/processed/D14_4F8_1.new_kneaddata_GRCm39_mice_db_bowtie2_paired_clean_1.fastq
11/30/2022 06:01:17 PM - kneaddata.utilities - DEBUG: Checking output file from bowtie2 : /sbidata/projects/lzhang/2022_mice/Data/rawData_part2/processed/D14_4F8_1.new_kneaddata_GRCm39_mice_db_bowtie2_paired_clean_2.fastq
11/30/2022 06:01:18 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /sbidata/projects/lzhang/2022_mice/Data/rawData_part2/processed/D14_4F8_1.new_kneaddata_GRCm39_mice_db_bowtie2_paired_contam_1.fastq ) : 2691082.0
11/30/2022 06:01:20 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /sbidata/projects/lzhang/2022_mice/Data/rawData_part2/processed/D14_4F8_1.new_kneaddata_GRCm39_mice_db_bowtie2_paired_contam_2.fastq ) : 2691082.0
11/30/2022 06:01:20 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /sbidata/projects/lzhang/2022_mice/Data/rawData_part2/processed/D14_4F8_1.new_kneaddata_GRCm39_mice_db_bowtie2_unmatched_1_contam.fastq ) : 434424.0
11/30/2022 06:01:20 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /sbidata/projects/lzhang/2022_mice/Data/rawData_part2/processed/D14_4F8_1.new_kneaddata_GRCm39_mice_db_bowtie2_unmatched_2_contam.fastq ) : 457777.0
11/30/2022 06:01:20 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated GRCm39_mice_db pair1 : Total reads after removing those found in reference database ( /sbidata/projects/lzhang/2022_mice/Data/rawData_part2/processed/D14_4F8_1.new_kneaddata_GRCm39_mice_db_bowtie2_paired_clean_1.fastq ): 2140.0
11/30/2022 06:01:20 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated GRCm39_mice_db pair2 : Total reads after removing those found in reference database ( /sbidata/projects/lzhang/2022_mice/Data/rawData_part2/processed/D14_4F8_1.new_kneaddata_GRCm39_mice_db_bowtie2_paired_clean_2.fastq ): 2140.0
11/30/2022 06:01:20 PM - kneaddata.utilities - INFO: READ COUNT: final pair1 : Total reads after merging results from multiple databases ( /sbidata/projects/lzhang/2022_mice/Data/rawData_part2/processed/D14_4F8_1.new_kneaddata_paired_1.fastq ): 2140.0
11/30/2022 06:01:20 PM - kneaddata.utilities - INFO: READ COUNT: final pair2 : Total reads after merging results from multiple databases ( /sbidata/projects/lzhang/2022_mice/Data/rawData_part2/processed/D14_4F8_1.new_kneaddata_paired_2.fastq ): 2140.0

overview of the whole procedure count number

Sample raw pair1 raw pair2 trimmed pair1 trimmed pair2 trimmed orphan1 trimmed orphan2 decontaminated GRCm39_mice_db pair1 decontaminated GRCm39_mice_db pair2 decontaminated GRCm39_mice_db orphan1 decontaminated GRCm39_mice_db orphan2 final pair1 final pair2 final orphan1 final orphan2
my_sample 3872223 3872223 3489545 3489545 164393 157032 2140 2140 406 501 2140 2140 406 501

Some samples have relatively okay results, but some have few reads left.
I am not sure whether is too much contaminated or something went wrong in the bowtie procedure?

Thanks a lot!

Hey,

I have a similar problem, did you manage to figure out how to solve this?