The bioBakery help forum

There are less reads survived after kneaddata

Hi,
When I use Kneaddata (v0.10.0) to trim sequences files and separate the reads from host, so many reads were trimmed after whole pipeline, it is nomal?

command & read coun table:
kneaddata -i R1.fastq -i R2.fastq -v -o out -db REFERENCE_DB --output-prefix R -t 30 --remove-intermediate-output --trimmomaticTRIMMOMATIC_PATH --trimmomatic-options ‘ILLUMINACLIP:TRIMMOMATIC_PATH/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:20 MINLEN:50’ --bowtie2-options ‘–very-sensitive --dovetail’

10/22/2021 03:43:23 PM - kneaddata.utilities - INFO: Reformatting file sequence identifiers ...
10/22/2021 03:43:26 PM - kneaddata.utilities - INFO: Reformatting file sequence identifiers ...
10/22/2021 03:43:30 PM - kneaddata.utilities - INFO: Reordering read identifiers ...
10/22/2021 03:43:41 PM - kneaddata.utilities - INFO: READ COUNT: raw pair1 : Initial number of reads ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered__34mhg4r_reformatted_identifiersltwktyld_DRR033605_1 ): 418953.0
10/22/2021 03:43:41 PM - kneaddata.utilities - INFO: READ COUNT: raw pair2 : Initial number of reads ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered_1avvz_3m_reformatted_identifierskn49z8zq_DRR033605_2 ): 418953.0
10/22/2021 03:43:41 PM - kneaddata.utilities - DEBUG: Checking input file to Trimmomatic : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered__34mhg4r_reformatted_identifiersltwktyld_DRR033605_1
10/22/2021 03:43:41 PM - kneaddata.utilities - DEBUG: Checking input file to Trimmomatic : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered_1avvz_3m_reformatted_identifierskn49z8zq_DRR033605_2
10/22/2021 03:43:41 PM - kneaddata.utilities - INFO: Running Trimmomatic ... 
10/22/2021 03:43:41 PM - kneaddata.utilities - INFO: Execute command: java -Xmx500m -jar /home/gene/lujie/software/Trimmomatic-0.39/trimmomatic-0.39.jar PE -threads 30 -phred33 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered__34mhg4r_reformatted_identifiersltwktyld_DRR033605_1 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered_1avvz_3m_reformatted_identifierskn49z8zq_DRR033605_2 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fastq /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fastq /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fastq /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fastq ILLUMINACLIP:/home/gene/lujie/software/Trimmomatic-0.39/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:20 MINLEN:50
10/22/2021 03:43:47 PM - kneaddata.utilities - DEBUG: b"TrimmomaticPE: Started with arguments:\n -threads 30 -phred33 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered__34mhg4r_reformatted_identifiersltwktyld_DRR033605_1 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered_1avvz_3m_reformatted_identifierskn49z8zq_DRR033605_2 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fastq /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fastq /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fastq /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fastq ILLUMINACLIP:/home/gene/lujie/software/Trimmomatic-0.39/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:20 MINLEN:50\nUsing PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'\nILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences\nInput Read Pairs: 418953 Both Surviving: 321550 (76.75%) Forward Only Surviving: 66852 (15.96%) Reverse Only Surviving: 14291 (3.41%) Dropped: 16260 (3.88%)\nTrimmomaticPE: Completed successfully\n"
10/22/2021 03:43:47 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fastq
10/22/2021 03:43:47 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fastq
10/22/2021 03:43:47 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fastq
10/22/2021 03:43:47 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fastq
10/22/2021 03:43:47 PM - kneaddata.utilities - INFO: READ COUNT: trimmed pair1 : Total reads after trimming ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fastq ): 321550.0
10/22/2021 03:43:47 PM - kneaddata.utilities - INFO: READ COUNT: trimmed pair2 : Total reads after trimming ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fastq ): 321550.0
10/22/2021 03:43:47 PM - kneaddata.utilities - INFO: READ COUNT: trimmed orphan1 : Total reads after trimming ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fastq ): 66852.0
10/22/2021 03:43:47 PM - kneaddata.utilities - INFO: READ COUNT: trimmed orphan2 : Total reads after trimming ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fastq ): 14291.0
10/22/2021 03:44:03 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fasta
10/22/2021 03:44:03 PM - kneaddata.utilities - INFO: Running trf ... 
10/22/2021 03:44:03 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fasta --output /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /home/gene/lujie/miniconda2/envs/kneaddata/bin/trf --trf-options '2 7 7 80 10 50 500 -h -ngs' --nproc 30
10/22/2021 03:44:07 PM - kneaddata.utilities - DEBUG: 0
10/22/2021 03:44:07 PM - kneaddata.utilities - DEBUG: Checking output file from trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat
10/22/2021 03:44:07 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fasta
10/22/2021 03:44:07 PM - kneaddata.utilities - INFO: Running trf ... 
10/22/2021 03:44:07 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fasta --output /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /home/gene/lujie/miniconda2/envs/kneaddata/bin/trf --trf-options '2 7 7 80 10 50 500 -h -ngs' --nproc 30
10/22/2021 03:44:11 PM - kneaddata.utilities - DEBUG: 0
10/22/2021 03:44:11 PM - kneaddata.utilities - DEBUG: Checking output file from trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat
10/22/2021 03:44:12 PM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fastq ): 190
10/22/2021 03:44:15 PM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fastq ): 161
10/22/2021 03:44:18 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fasta
10/22/2021 03:44:18 PM - kneaddata.utilities - INFO: Running trf ... 
10/22/2021 03:44:18 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fasta --output /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /home/gene/lujie/miniconda2/envs/kneaddata/bin/trf --trf-options '2 7 7 80 10 50 500 -h -ngs' --nproc 30
10/22/2021 03:44:19 PM - kneaddata.utilities - DEBUG: 0
10/22/2021 03:44:19 PM - kneaddata.utilities - DEBUG: Checking output file from trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat
10/22/2021 03:44:19 PM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fastq ): 34
10/22/2021 03:44:20 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fasta
10/22/2021 03:44:20 PM - kneaddata.utilities - INFO: Running trf ... 
10/22/2021 03:44:20 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fasta --output /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /home/gene/lujie/miniconda2/envs/kneaddata/bin/trf --trf-options '2 7 7 80 10 50 500 -h -ngs' --nproc 30
10/22/2021 03:44:21 PM - kneaddata.utilities - DEBUG: 0
10/22/2021 03:44:21 PM - kneaddata.utilities - DEBUG: Checking output file from trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat
10/22/2021 03:44:21 PM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fastq ): 5
10/22/2021 03:44:21 PM - kneaddata.run - INFO: Decontaminating ...
10/22/2021 03:44:21 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.1.fastq
10/22/2021 03:44:21 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.2.fastq
10/22/2021 03:44:21 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.unmatched.1.fastq
10/22/2021 03:44:21 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.unmatched.2.fastq
10/22/2021 03:44:21 PM - kneaddata.utilities - INFO: Running bowtie2 ... 
10/22/2021 03:44:21 PM - kneaddata.utilities - INFO: Execute command: kneaddata_bowtie2_discordant_pairs --bowtie2 /home/gene/lujie/miniconda2/envs/kneaddata/bin/bowtie2 --threads 30 -x /home/gene/lujie/software/metagenome_database/human_genome_database/hg37/hg37 --mode strict --bowtie2-options "--very-sensitive --dovetail --phred33" -1 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.1.fastq -2 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.2.fastq --un-pair /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_clean_%.fastq --al-pair /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_contam_%.fastq -U /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.unmatched.1.fastq,/home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.unmatched.2.fastq --un-single /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_%_clean.fastq --al-single /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_%_contam.fastq -S /dev/null
10/22/2021 03:45:15 PM - kneaddata.utilities - DEBUG: b'723853 reads; of these:\n  723853 (100.00%) were unpaired; of these:\n    723782 (99.99%) aligned 0 times\n    38 (0.01%) aligned exactly 1 time\n    33 (0.00%) aligned >1 times\n0.01% overall alignment rate\npair1_aligned : 14\npair2_aligned : 14\npair1_unaligned : 50695\npair2_unaligned : 50695\norphan1_aligned : 24\norphan2_aligned : 21\norphan1_unaligned : 337445\norphan2_unaligned : 284945\n'
10/22/2021 03:45:15 PM - kneaddata.utilities - DEBUG: Checking output file from bowtie2 : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_clean_1.fastq
10/22/2021 03:45:15 PM - kneaddata.utilities - DEBUG: Checking output file from bowtie2 : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_clean_2.fastq
10/22/2021 03:45:16 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_contam_1.fastq ) : 14.0
10/22/2021 03:45:16 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_contam_2.fastq ) : 14.0
10/22/2021 03:45:16 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_1_contam.fastq ) : 24.0
10/22/2021 03:45:16 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_2_contam.fastq ) : 21.0
10/22/2021 03:45:16 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated hg37 pair1 : Total reads after removing those found in reference database ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_clean_1.fastq ): 50695.0
10/22/2021 03:45:16 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated hg37 pair2 : Total reads after removing those found in reference database ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_clean_2.fastq ): 50695.0
10/22/2021 03:45:16 PM - kneaddata.utilities - INFO: READ COUNT: final pair1 : Total reads after merging results from multiple databases ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_paired_1.fastq ): 50695.0
10/22/2021 03:45:16 PM - kneaddata.utilities - INFO: READ COUNT: final pair2 : Total reads after merging results from multiple databases ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_paired_2.fastq ): 50695.0
10/22/2021 03:45:16 PM - kneaddata.utilities - WARNING: Unable to remove file: /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_clean_1.fastq
10/22/2021 03:45:16 PM - kneaddata.utilities - WARNING: Unable to remove file: /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_clean_2.fastq
10/22/2021 03:45:17 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated hg37 orphan1 : Total reads after removing those found in reference database ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_1_clean.fastq ): 337445.0
10/22/2021 03:45:18 PM - kneaddata.utilities - INFO: READ COUNT: final orphan1 : Total reads after merging results from multiple databases ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_unmatched_1.fastq ): 337445.0
10/22/2021 03:45:18 PM - kneaddata.utilities - WARNING: Unable to remove file: /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_1_clean.fastq
10/22/2021 03:45:18 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated hg37 orphan2 : Total reads after removing those found in reference database ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_2_clean.fastq ): 284945.0
10/22/2021 03:45:19 PM - kneaddata.utilities - INFO: READ COUNT: final orphan2 : Total reads after merging results from multiple databases ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_unmatched_2.fastq ): 284945.0
10/22/2021 03:45:19 PM - kneaddata.utilities - WARNING: Unable to remove file: /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_2_clean.fastq

It showed that there were most reads survived after trimmed, and there were less reads identified as contaminants from the human genome database, but only ~16% reads were paired reads survived.

Need your help

Sincerely,
Catslu

Hi @wusan1234 ,
I see that 99% of reads are not being aligned in the bowtie2 step of kneaddata. I doubt that the sequence identifiers format is causing the alignment issue. Is it possible to provide me your seq identifier sample of both paired end .fastq. Also, can you try updating kneaddata to v0.11.0 please?

Regards,
Sagun