Hi,
When I use Kneaddata (v0.10.0) to trim sequences files and separate the reads from host, so many reads were trimmed after whole pipeline, it is nomal?
command & read coun table:
kneaddata -i R1.fastq -i R2.fastq -v -o out -db REFERENCE_DB --output-prefix R -t 30 --remove-intermediate-output --trimmomaticTRIMMOMATIC_PATH --trimmomatic-options ‘ILLUMINACLIP:TRIMMOMATIC_PATH/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:20 MINLEN:50’ --bowtie2-options ‘–very-sensitive --dovetail’
10/22/2021 03:43:23 PM - kneaddata.utilities - INFO: Reformatting file sequence identifiers ...
10/22/2021 03:43:26 PM - kneaddata.utilities - INFO: Reformatting file sequence identifiers ...
10/22/2021 03:43:30 PM - kneaddata.utilities - INFO: Reordering read identifiers ...
10/22/2021 03:43:41 PM - kneaddata.utilities - INFO: READ COUNT: raw pair1 : Initial number of reads ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered__34mhg4r_reformatted_identifiersltwktyld_DRR033605_1 ): 418953.0
10/22/2021 03:43:41 PM - kneaddata.utilities - INFO: READ COUNT: raw pair2 : Initial number of reads ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered_1avvz_3m_reformatted_identifierskn49z8zq_DRR033605_2 ): 418953.0
10/22/2021 03:43:41 PM - kneaddata.utilities - DEBUG: Checking input file to Trimmomatic : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered__34mhg4r_reformatted_identifiersltwktyld_DRR033605_1
10/22/2021 03:43:41 PM - kneaddata.utilities - DEBUG: Checking input file to Trimmomatic : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered_1avvz_3m_reformatted_identifierskn49z8zq_DRR033605_2
10/22/2021 03:43:41 PM - kneaddata.utilities - INFO: Running Trimmomatic ...
10/22/2021 03:43:41 PM - kneaddata.utilities - INFO: Execute command: java -Xmx500m -jar /home/gene/lujie/software/Trimmomatic-0.39/trimmomatic-0.39.jar PE -threads 30 -phred33 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered__34mhg4r_reformatted_identifiersltwktyld_DRR033605_1 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered_1avvz_3m_reformatted_identifierskn49z8zq_DRR033605_2 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fastq /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fastq /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fastq /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fastq ILLUMINACLIP:/home/gene/lujie/software/Trimmomatic-0.39/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:20 MINLEN:50
10/22/2021 03:43:47 PM - kneaddata.utilities - DEBUG: b"TrimmomaticPE: Started with arguments:\n -threads 30 -phred33 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered__34mhg4r_reformatted_identifiersltwktyld_DRR033605_1 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered_1avvz_3m_reformatted_identifierskn49z8zq_DRR033605_2 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fastq /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fastq /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fastq /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fastq ILLUMINACLIP:/home/gene/lujie/software/Trimmomatic-0.39/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:20 MINLEN:50\nUsing PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'\nILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences\nInput Read Pairs: 418953 Both Surviving: 321550 (76.75%) Forward Only Surviving: 66852 (15.96%) Reverse Only Surviving: 14291 (3.41%) Dropped: 16260 (3.88%)\nTrimmomaticPE: Completed successfully\n"
10/22/2021 03:43:47 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fastq
10/22/2021 03:43:47 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fastq
10/22/2021 03:43:47 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fastq
10/22/2021 03:43:47 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fastq
10/22/2021 03:43:47 PM - kneaddata.utilities - INFO: READ COUNT: trimmed pair1 : Total reads after trimming ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fastq ): 321550.0
10/22/2021 03:43:47 PM - kneaddata.utilities - INFO: READ COUNT: trimmed pair2 : Total reads after trimming ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fastq ): 321550.0
10/22/2021 03:43:47 PM - kneaddata.utilities - INFO: READ COUNT: trimmed orphan1 : Total reads after trimming ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fastq ): 66852.0
10/22/2021 03:43:47 PM - kneaddata.utilities - INFO: READ COUNT: trimmed orphan2 : Total reads after trimming ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fastq ): 14291.0
10/22/2021 03:44:03 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fasta
10/22/2021 03:44:03 PM - kneaddata.utilities - INFO: Running trf ...
10/22/2021 03:44:03 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fasta --output /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /home/gene/lujie/miniconda2/envs/kneaddata/bin/trf --trf-options '2 7 7 80 10 50 500 -h -ngs' --nproc 30
10/22/2021 03:44:07 PM - kneaddata.utilities - DEBUG: 0
10/22/2021 03:44:07 PM - kneaddata.utilities - DEBUG: Checking output file from trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat
10/22/2021 03:44:07 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fasta
10/22/2021 03:44:07 PM - kneaddata.utilities - INFO: Running trf ...
10/22/2021 03:44:07 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fasta --output /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /home/gene/lujie/miniconda2/envs/kneaddata/bin/trf --trf-options '2 7 7 80 10 50 500 -h -ngs' --nproc 30
10/22/2021 03:44:11 PM - kneaddata.utilities - DEBUG: 0
10/22/2021 03:44:11 PM - kneaddata.utilities - DEBUG: Checking output file from trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat
10/22/2021 03:44:12 PM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fastq ): 190
10/22/2021 03:44:15 PM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fastq ): 161
10/22/2021 03:44:18 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fasta
10/22/2021 03:44:18 PM - kneaddata.utilities - INFO: Running trf ...
10/22/2021 03:44:18 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fasta --output /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /home/gene/lujie/miniconda2/envs/kneaddata/bin/trf --trf-options '2 7 7 80 10 50 500 -h -ngs' --nproc 30
10/22/2021 03:44:19 PM - kneaddata.utilities - DEBUG: 0
10/22/2021 03:44:19 PM - kneaddata.utilities - DEBUG: Checking output file from trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat
10/22/2021 03:44:19 PM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fastq ): 34
10/22/2021 03:44:20 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fasta
10/22/2021 03:44:20 PM - kneaddata.utilities - INFO: Running trf ...
10/22/2021 03:44:20 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fasta --output /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /home/gene/lujie/miniconda2/envs/kneaddata/bin/trf --trf-options '2 7 7 80 10 50 500 -h -ngs' --nproc 30
10/22/2021 03:44:21 PM - kneaddata.utilities - DEBUG: 0
10/22/2021 03:44:21 PM - kneaddata.utilities - DEBUG: Checking output file from trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat
10/22/2021 03:44:21 PM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fastq ): 5
10/22/2021 03:44:21 PM - kneaddata.run - INFO: Decontaminating ...
10/22/2021 03:44:21 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.1.fastq
10/22/2021 03:44:21 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.2.fastq
10/22/2021 03:44:21 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.unmatched.1.fastq
10/22/2021 03:44:21 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.unmatched.2.fastq
10/22/2021 03:44:21 PM - kneaddata.utilities - INFO: Running bowtie2 ...
10/22/2021 03:44:21 PM - kneaddata.utilities - INFO: Execute command: kneaddata_bowtie2_discordant_pairs --bowtie2 /home/gene/lujie/miniconda2/envs/kneaddata/bin/bowtie2 --threads 30 -x /home/gene/lujie/software/metagenome_database/human_genome_database/hg37/hg37 --mode strict --bowtie2-options "--very-sensitive --dovetail --phred33" -1 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.1.fastq -2 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.2.fastq --un-pair /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_clean_%.fastq --al-pair /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_contam_%.fastq -U /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.unmatched.1.fastq,/home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.unmatched.2.fastq --un-single /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_%_clean.fastq --al-single /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_%_contam.fastq -S /dev/null
10/22/2021 03:45:15 PM - kneaddata.utilities - DEBUG: b'723853 reads; of these:\n 723853 (100.00%) were unpaired; of these:\n 723782 (99.99%) aligned 0 times\n 38 (0.01%) aligned exactly 1 time\n 33 (0.00%) aligned >1 times\n0.01% overall alignment rate\npair1_aligned : 14\npair2_aligned : 14\npair1_unaligned : 50695\npair2_unaligned : 50695\norphan1_aligned : 24\norphan2_aligned : 21\norphan1_unaligned : 337445\norphan2_unaligned : 284945\n'
10/22/2021 03:45:15 PM - kneaddata.utilities - DEBUG: Checking output file from bowtie2 : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_clean_1.fastq
10/22/2021 03:45:15 PM - kneaddata.utilities - DEBUG: Checking output file from bowtie2 : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_clean_2.fastq
10/22/2021 03:45:16 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_contam_1.fastq ) : 14.0
10/22/2021 03:45:16 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_contam_2.fastq ) : 14.0
10/22/2021 03:45:16 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_1_contam.fastq ) : 24.0
10/22/2021 03:45:16 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_2_contam.fastq ) : 21.0
10/22/2021 03:45:16 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated hg37 pair1 : Total reads after removing those found in reference database ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_clean_1.fastq ): 50695.0
10/22/2021 03:45:16 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated hg37 pair2 : Total reads after removing those found in reference database ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_clean_2.fastq ): 50695.0
10/22/2021 03:45:16 PM - kneaddata.utilities - INFO: READ COUNT: final pair1 : Total reads after merging results from multiple databases ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_paired_1.fastq ): 50695.0
10/22/2021 03:45:16 PM - kneaddata.utilities - INFO: READ COUNT: final pair2 : Total reads after merging results from multiple databases ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_paired_2.fastq ): 50695.0
10/22/2021 03:45:16 PM - kneaddata.utilities - WARNING: Unable to remove file: /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_clean_1.fastq
10/22/2021 03:45:16 PM - kneaddata.utilities - WARNING: Unable to remove file: /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_clean_2.fastq
10/22/2021 03:45:17 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated hg37 orphan1 : Total reads after removing those found in reference database ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_1_clean.fastq ): 337445.0
10/22/2021 03:45:18 PM - kneaddata.utilities - INFO: READ COUNT: final orphan1 : Total reads after merging results from multiple databases ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_unmatched_1.fastq ): 337445.0
10/22/2021 03:45:18 PM - kneaddata.utilities - WARNING: Unable to remove file: /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_1_clean.fastq
10/22/2021 03:45:18 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated hg37 orphan2 : Total reads after removing those found in reference database ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_2_clean.fastq ): 284945.0
10/22/2021 03:45:19 PM - kneaddata.utilities - INFO: READ COUNT: final orphan2 : Total reads after merging results from multiple databases ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_unmatched_2.fastq ): 284945.0
10/22/2021 03:45:19 PM - kneaddata.utilities - WARNING: Unable to remove file: /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_2_clean.fastq
It showed that there were most reads survived after trimmed, and there were less reads identified as contaminants from the human genome database, but only ~16% reads were paired reads survived.
Need your help
Sincerely,
Catslu