There are less reads survived after kneaddata

Hi,
When I use Kneaddata (v0.10.0) to trim sequences files and separate the reads from host, so many reads were trimmed after whole pipeline, it is nomal?

command & read coun table:
kneaddata -i R1.fastq -i R2.fastq -v -o out -db REFERENCE_DB --output-prefix R -t 30 --remove-intermediate-output --trimmomaticTRIMMOMATIC_PATH --trimmomatic-options ‘ILLUMINACLIP:TRIMMOMATIC_PATH/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:20 MINLEN:50’ --bowtie2-options ‘–very-sensitive --dovetail’

10/22/2021 03:43:23 PM - kneaddata.utilities - INFO: Reformatting file sequence identifiers ...
10/22/2021 03:43:26 PM - kneaddata.utilities - INFO: Reformatting file sequence identifiers ...
10/22/2021 03:43:30 PM - kneaddata.utilities - INFO: Reordering read identifiers ...
10/22/2021 03:43:41 PM - kneaddata.utilities - INFO: READ COUNT: raw pair1 : Initial number of reads ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered__34mhg4r_reformatted_identifiersltwktyld_DRR033605_1 ): 418953.0
10/22/2021 03:43:41 PM - kneaddata.utilities - INFO: READ COUNT: raw pair2 : Initial number of reads ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered_1avvz_3m_reformatted_identifierskn49z8zq_DRR033605_2 ): 418953.0
10/22/2021 03:43:41 PM - kneaddata.utilities - DEBUG: Checking input file to Trimmomatic : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered__34mhg4r_reformatted_identifiersltwktyld_DRR033605_1
10/22/2021 03:43:41 PM - kneaddata.utilities - DEBUG: Checking input file to Trimmomatic : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered_1avvz_3m_reformatted_identifierskn49z8zq_DRR033605_2
10/22/2021 03:43:41 PM - kneaddata.utilities - INFO: Running Trimmomatic ... 
10/22/2021 03:43:41 PM - kneaddata.utilities - INFO: Execute command: java -Xmx500m -jar /home/gene/lujie/software/Trimmomatic-0.39/trimmomatic-0.39.jar PE -threads 30 -phred33 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered__34mhg4r_reformatted_identifiersltwktyld_DRR033605_1 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered_1avvz_3m_reformatted_identifierskn49z8zq_DRR033605_2 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fastq /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fastq /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fastq /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fastq ILLUMINACLIP:/home/gene/lujie/software/Trimmomatic-0.39/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:20 MINLEN:50
10/22/2021 03:43:47 PM - kneaddata.utilities - DEBUG: b"TrimmomaticPE: Started with arguments:\n -threads 30 -phred33 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered__34mhg4r_reformatted_identifiersltwktyld_DRR033605_1 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/reordered_1avvz_3m_reformatted_identifierskn49z8zq_DRR033605_2 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fastq /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fastq /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fastq /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fastq ILLUMINACLIP:/home/gene/lujie/software/Trimmomatic-0.39/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:20 MINLEN:50\nUsing PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'\nILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences\nInput Read Pairs: 418953 Both Surviving: 321550 (76.75%) Forward Only Surviving: 66852 (15.96%) Reverse Only Surviving: 14291 (3.41%) Dropped: 16260 (3.88%)\nTrimmomaticPE: Completed successfully\n"
10/22/2021 03:43:47 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fastq
10/22/2021 03:43:47 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fastq
10/22/2021 03:43:47 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fastq
10/22/2021 03:43:47 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fastq
10/22/2021 03:43:47 PM - kneaddata.utilities - INFO: READ COUNT: trimmed pair1 : Total reads after trimming ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fastq ): 321550.0
10/22/2021 03:43:47 PM - kneaddata.utilities - INFO: READ COUNT: trimmed pair2 : Total reads after trimming ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fastq ): 321550.0
10/22/2021 03:43:47 PM - kneaddata.utilities - INFO: READ COUNT: trimmed orphan1 : Total reads after trimming ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fastq ): 66852.0
10/22/2021 03:43:47 PM - kneaddata.utilities - INFO: READ COUNT: trimmed orphan2 : Total reads after trimming ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fastq ): 14291.0
10/22/2021 03:44:03 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fasta
10/22/2021 03:44:03 PM - kneaddata.utilities - INFO: Running trf ... 
10/22/2021 03:44:03 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fasta --output /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /home/gene/lujie/miniconda2/envs/kneaddata/bin/trf --trf-options '2 7 7 80 10 50 500 -h -ngs' --nproc 30
10/22/2021 03:44:07 PM - kneaddata.utilities - DEBUG: 0
10/22/2021 03:44:07 PM - kneaddata.utilities - DEBUG: Checking output file from trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat
10/22/2021 03:44:07 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fasta
10/22/2021 03:44:07 PM - kneaddata.utilities - INFO: Running trf ... 
10/22/2021 03:44:07 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fasta --output /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /home/gene/lujie/miniconda2/envs/kneaddata/bin/trf --trf-options '2 7 7 80 10 50 500 -h -ngs' --nproc 30
10/22/2021 03:44:11 PM - kneaddata.utilities - DEBUG: 0
10/22/2021 03:44:11 PM - kneaddata.utilities - DEBUG: Checking output file from trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat
10/22/2021 03:44:12 PM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.1.fastq ): 190
10/22/2021 03:44:15 PM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.2.fastq ): 161
10/22/2021 03:44:18 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fasta
10/22/2021 03:44:18 PM - kneaddata.utilities - INFO: Running trf ... 
10/22/2021 03:44:18 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fasta --output /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /home/gene/lujie/miniconda2/envs/kneaddata/bin/trf --trf-options '2 7 7 80 10 50 500 -h -ngs' --nproc 30
10/22/2021 03:44:19 PM - kneaddata.utilities - DEBUG: 0
10/22/2021 03:44:19 PM - kneaddata.utilities - DEBUG: Checking output file from trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat
10/22/2021 03:44:19 PM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.1.fastq ): 34
10/22/2021 03:44:20 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fasta
10/22/2021 03:44:20 PM - kneaddata.utilities - INFO: Running trf ... 
10/22/2021 03:44:20 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fasta --output /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /home/gene/lujie/miniconda2/envs/kneaddata/bin/trf --trf-options '2 7 7 80 10 50 500 -h -ngs' --nproc 30
10/22/2021 03:44:21 PM - kneaddata.utilities - DEBUG: 0
10/22/2021 03:44:21 PM - kneaddata.utilities - DEBUG: Checking output file from trf : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat
10/22/2021 03:44:21 PM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.trimmed.single.2.fastq ): 5
10/22/2021 03:44:21 PM - kneaddata.run - INFO: Decontaminating ...
10/22/2021 03:44:21 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.1.fastq
10/22/2021 03:44:21 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.2.fastq
10/22/2021 03:44:21 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.unmatched.1.fastq
10/22/2021 03:44:21 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.unmatched.2.fastq
10/22/2021 03:44:21 PM - kneaddata.utilities - INFO: Running bowtie2 ... 
10/22/2021 03:44:21 PM - kneaddata.utilities - INFO: Execute command: kneaddata_bowtie2_discordant_pairs --bowtie2 /home/gene/lujie/miniconda2/envs/kneaddata/bin/bowtie2 --threads 30 -x /home/gene/lujie/software/metagenome_database/human_genome_database/hg37/hg37 --mode strict --bowtie2-options "--very-sensitive --dovetail --phred33" -1 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.1.fastq -2 /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.2.fastq --un-pair /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_clean_%.fastq --al-pair /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_contam_%.fastq -U /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.unmatched.1.fastq,/home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605.repeats.removed.unmatched.2.fastq --un-single /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_%_clean.fastq --al-single /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_%_contam.fastq -S /dev/null
10/22/2021 03:45:15 PM - kneaddata.utilities - DEBUG: b'723853 reads; of these:\n  723853 (100.00%) were unpaired; of these:\n    723782 (99.99%) aligned 0 times\n    38 (0.01%) aligned exactly 1 time\n    33 (0.00%) aligned >1 times\n0.01% overall alignment rate\npair1_aligned : 14\npair2_aligned : 14\npair1_unaligned : 50695\npair2_unaligned : 50695\norphan1_aligned : 24\norphan2_aligned : 21\norphan1_unaligned : 337445\norphan2_unaligned : 284945\n'
10/22/2021 03:45:15 PM - kneaddata.utilities - DEBUG: Checking output file from bowtie2 : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_clean_1.fastq
10/22/2021 03:45:15 PM - kneaddata.utilities - DEBUG: Checking output file from bowtie2 : /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_clean_2.fastq
10/22/2021 03:45:16 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_contam_1.fastq ) : 14.0
10/22/2021 03:45:16 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_contam_2.fastq ) : 14.0
10/22/2021 03:45:16 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_1_contam.fastq ) : 24.0
10/22/2021 03:45:16 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_2_contam.fastq ) : 21.0
10/22/2021 03:45:16 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated hg37 pair1 : Total reads after removing those found in reference database ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_clean_1.fastq ): 50695.0
10/22/2021 03:45:16 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated hg37 pair2 : Total reads after removing those found in reference database ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_clean_2.fastq ): 50695.0
10/22/2021 03:45:16 PM - kneaddata.utilities - INFO: READ COUNT: final pair1 : Total reads after merging results from multiple databases ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_paired_1.fastq ): 50695.0
10/22/2021 03:45:16 PM - kneaddata.utilities - INFO: READ COUNT: final pair2 : Total reads after merging results from multiple databases ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_paired_2.fastq ): 50695.0
10/22/2021 03:45:16 PM - kneaddata.utilities - WARNING: Unable to remove file: /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_clean_1.fastq
10/22/2021 03:45:16 PM - kneaddata.utilities - WARNING: Unable to remove file: /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_paired_clean_2.fastq
10/22/2021 03:45:17 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated hg37 orphan1 : Total reads after removing those found in reference database ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_1_clean.fastq ): 337445.0
10/22/2021 03:45:18 PM - kneaddata.utilities - INFO: READ COUNT: final orphan1 : Total reads after merging results from multiple databases ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_unmatched_1.fastq ): 337445.0
10/22/2021 03:45:18 PM - kneaddata.utilities - WARNING: Unable to remove file: /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_1_clean.fastq
10/22/2021 03:45:18 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated hg37 orphan2 : Total reads after removing those found in reference database ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_2_clean.fastq ): 284945.0
10/22/2021 03:45:19 PM - kneaddata.utilities - INFO: READ COUNT: final orphan2 : Total reads after merging results from multiple databases ( /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_unmatched_2.fastq ): 284945.0
10/22/2021 03:45:19 PM - kneaddata.utilities - WARNING: Unable to remove file: /home/gene/lujie/oralproject/sra/DRP003573/pre-trimed-reads/raw/copy/out/DRR033605_hg37_bowtie2_unmatched_2_clean.fastq

It showed that there were most reads survived after trimmed, and there were less reads identified as contaminants from the human genome database, but only ~16% reads were paired reads survived.

Need your help

Sincerely,
Catslu

Hi @wusan1234 ,
I see that 99% of reads are not being aligned in the bowtie2 step of kneaddata. I doubt that the sequence identifiers format is causing the alignment issue. Is it possible to provide me your seq identifier sample of both paired end .fastq. Also, can you try updating kneaddata to v0.11.0 please?

Regards,
Sagun

Hi, I have the same problem. have you resolved it?

I used v.0.6.1 to solve it

Hi,

I’m also seeing more reads removed with the 0.10.0 version of KneadData for metatranscriptomics. I saw the suggestion to try a later version of KneadData, so I downloaded 0.12, which has slightly different command line arguments. I ended up with zero reads in the final result for version 0.12. I’m not sure what happened there.

Any ideas what has changed between the versions 0.7.2 and 0.10.0? Is it something like a bowtie2 option or is it the ribosomal RNA database?

Here’s an example of the results that I see for the same files (R1,R2 pair) run thru different versions of KneadData (using the numbers as printed in the terminal or KneadData log):

$ kneaddata --version
kneaddata v0.7.2
# with SILVA_128 bowtie database downloaded in 2016

Initial: 16540207.0
After trimming: 16341525.0
After SILVA_128_LSUParc_SSUParc_ribosomal_RNA: 16322369.0
After hg38_refMrna: 16281496.0
Total reads after merging results: 16262675.0
$ kneaddata --version
kneaddata v0.10.0
# with SILVA_128 bowtie database downloaded in 2020,
#   has different size than 2016 version

Initial: 16540207.0
After trimming: 16341525.0
After SILVA_128_LSUParc_SSUParc_ribosomal_RNA : 2159266.0
After hg38_refMrna: 16281496.0
Total reads after merging results: 2156981.0
$ kneaddata --version
kneaddata v0.12.0
# with bypass-trf option used
# with SILVA_128 bowtie database downloaded in 2020, 
#   has different size than 2016 version

Initial: 16540207.0
After trimming: 16341525.0
After SILVA_128_LSUParc_SSUParc_ribosomal_RNA: 2038670.0
After hg38_refMrna: 16256630.0
Total reads after merging results: 0  -- this was a little shocking, I guess something else changed that I missed

I found I had a shell scripting environment problem, and was not running version 0.10.0, but instead 0.7.2 with the 2020/2021 version of the ribosomal database (the timestamp that I have for the bowtie database files are a mix of Jul 1 2020, and Jul 2 2020)

I know the bowtie database for the ribsomal RNA changed, because the files have different sizes. For example:

$ wc -c kneaddata-databases/SILVA_128_LSUParc_SSUParc_ribosomal_RNA.1.bt2l kneaddata-databases-2021/SILVA_128_LSUParc_SSUParc_ribosomal_RNA.1.bt2l
2502317603 kneaddata-databases/SILVA_128_LSUParc_SSUParc_ribosomal_RNA.1.bt2l
2904567011 kneaddata-databases-2021/SILVA_128_LSUParc_SSUParc_ribosomal_RNA.1.bt2l

$ md5sum kneaddata-databases/SILVA_128_LSUParc_SSUParc_ribosomal_RNA.1.bt2l kneaddata-databases-2021/SILVA_128_LSUParc_SSUParc_ribosomal_RNA.1.bt2l
2216c1a74e790ee556ec972dc0fcf42d  kneaddata-databases/SILVA_128_LSUParc_SSUParc_ribosomal_RNA.1.bt2l
59712c0bba6f6059b3f92cb724085400  kneaddata-databases-2021/SILVA_128_LSUParc_SSUParc_ribosomal_RNA.1.bt2l

What is different about these databases for KneadData?

Also if I want to use a newer version of SILVA, say SILVA 138, are there any special options used? Can I just run “bowtie2-build”?

Thanks