The bioBakery help forum

Problem with paired end demo on new install

I have a strange and jarring error. I have just installed kneaaddata on a new cluster for the first time, and I am seeing some strange behaviour. I installed with pip install kneaddata. This is kneaddata v0.7.10, running on CentOS 6.7 with python 2.7.13.

When I run it on any paired data (including the demo data), it runs without error but all of the reads end up in the “unmatched_1.fastq” file. This is what I see after running the paired end command in the tutorial:

bash$ wc seq1_kneaddata*
0 0 0 seq1_kneaddata_demo_db_bowtie2_paired_contam_1.fastq
0 0 0 seq1_kneaddata_demo_db_bowtie2_paired_contam_2.fastq
0 0 0 seq1_kneaddata_demo_db_bowtie2_unmatched_1_contam.fastq
0 0 0 seq1_kneaddata_demo_db_bowtie2_unmatched_2_contam.fastq
138 1354 18029 seq1_kneaddata.log
0 0 0 seq1_kneaddata_paired_1.fastq
0 0 0 seq1_kneaddata_paired_2.fastq
141364 141364 13333610 seq1_kneaddata.repeats.removed.1.fastq
141364 141364 13302786 seq1_kneaddata.repeats.removed.2.fastq
21540 21540 1899203 seq1_kneaddata.repeats.removed.unmatched.1.fastq
3384 3384 289698 seq1_kneaddata.repeats.removed.unmatched.2.fastq
141364 141364 13333610 seq1_kneaddata.trimmed.1.fastq
141364 141364 13302786 seq1_kneaddata.trimmed.2.fastq
21540 21540 1899203 seq1_kneaddata.trimmed.single.1.fastq
3388 3388 290085 seq1_kneaddata.trimmed.single.2.fastq
141364 141364 13333610 seq1_kneaddata_unmatched_1.fastq
0 0 0 seq1_kneaddata_unmatched_2.fastq

and looking in the log I see

09/11/2020 04:53:28 PM - kneaddata.utilities - DEBUG: 35341 reads; of these:
35341 (100.00%) were unpaired; of these:
35341 (100.00%) aligned 0 times
0 (0.00%) aligned exactly 1 time

I have tried playing around with the sequence identifier lines in the fastq, but it doesn’t seem to have any impact. Am I missing some dependency, or is there some known issue with parsing the fastq files on some systems?

Hello, Thank you for the detailed post. I tried out the tutorial files with v0.7.10 and I see reads in the paired output files as expected. The tool and dependencies should operate the same on different operating systems; We run/test on a variety of platforms including Centos (which you are using). Is it possible the read identifiers in the files are of an unexpected format? Just to double check can you re-download the files and try running again. If you see the same issue would you post your log file? Then I can dig in a bit more to try to figure out what might be up.

Thank you,
Lauren

I uninstalled and reinstalled kneaddataa, and redownloaded the data, but still see the same issue. Commands run and log file are below.

bash$ pip install kneaddata
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support
Processing /gfs/home/ljostins/.cache/pip/wheels/92/95/06/8892c06b81ccd85a710524ad1edf5f77b24a5be0faeb89d5b6/kneaddata-0.7.10-cp27-none-any.whl
Installing collected packages: kneaddata
Successfully installed kneaddata-0.7.10
WARNING: You are using pip version 20.2.1; however, version 20.2.3 is available.
You should consider upgrading via the ‘/gfs/devel/ljostins/python-venv-2.7.13/bin/python -m pip install --upgrade pip’ command.
bash$ wget https://github.com/biobakery/kneaddata/files/4703820/input.zip
–2020-09-18 10:03:46-- https://github.com/biobakery/kneaddata/files/4703820/input.zip
Resolving github.com… 140.82.121.4
Connecting to github.com|140.82.121.4|:443… connected.
HTTP request sent, awaiting response… 302 Found
Location: https://github-production-repository-file-5c1aeb.s3.amazonaws.com/253871273/4703820?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200918%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200918T090347Z&X-Amz-Expires=300&X-Amz-Signature=d029decd647f7ed7e1830c38a3852aac07654bb62ddff9ee60831f7849dc3ee2&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=253871273&response-content-disposition=attachment%3Bfilename%3Dinput.zip&response-content-type=application%2Fzip [following]
–2020-09-18 10:03:47-- https://github-production-repository-file-5c1aeb.s3.amazonaws.com/253871273/4703820?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200918%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200918T090347Z&X-Amz-Expires=300&X-Amz-Signature=d029decd647f7ed7e1830c38a3852aac07654bb62ddff9ee60831f7849dc3ee2&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=253871273&response-content-disposition=attachment%3Bfilename%3Dinput.zip&response-content-type=application%2Fzip
Resolving github-production-repository-file-5c1aeb.s3.amazonaws.com… 52.217.88.100
Connecting to github-production-repository-file-5c1aeb.s3.amazonaws.com|52.217.88.100|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 9856071 (9.4M) [application/zip]
Saving to: `input.zip’

100%[==========================================================================================================================================>] 9,856,071 3.74M/s in 2.5s

2020-09-18 10:03:50 (3.74 MB/s) - `input.zip’ saved [9856071/9856071]

bash$ unzip input.zip
Archive: input.zip
creating: input/
inflating: input/.DS_Store
creating: __MACOSX/
creating: __MACOSX/input/
inflating: __MACOSX/input/._.DS_Store
inflating: input/seq2.fastq
inflating: __MACOSX/input/._seq2.fastq
inflating: input/seq1.fastq
inflating: __MACOSX/input/._seq1.fastq
inflating: input/demo_db.3.bt2
inflating: __MACOSX/input/._demo_db.3.bt2
inflating: input/demo_db.2.bt2
inflating: __MACOSX/input/._demo_db.2.bt2
inflating: input/singleEnd.fastq
inflating: __MACOSX/input/._singleEnd.fastq
inflating: input/demo_db.1.bt2
inflating: __MACOSX/input/._demo_db.1.bt2
inflating: input/SE_extra.fastq
inflating: __MACOSX/input/._SE_extra.fastq
inflating: input/demo_db.4.bt2
inflating: __MACOSX/input/._demo_db.4.bt2
inflating: input/demo_db.rev.1.bt2
inflating: __MACOSX/input/._demo_db.rev.1.bt2
inflating: input/demo_db.rev.2.bt2
inflating: __MACOSX/input/._demo_db.rev.2.bt2
inflating: __MACOSX/._input

bash$ kneaddata --input input/seq1.fastq --input input/seq2.fastq --reference-db input/demo_db --output kneaddataOutputPairedEnd --trf …/…/software/trf/
Initial number of reads ( /gfs/archive/jostins/microbiome/kneadata/demo/input/seq1.fastq ): 42473
Initial number of reads ( /gfs/archive/jostins/microbiome/kneadata/demo/input/seq2.fastq ): 42473
Running Trimmomatic …
Total reads after trimming ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.1.fastq ): 35341
Total reads after trimming ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.2.fastq ): 35341
Total reads after trimming ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.1.fastq ): 5385
Total reads after trimming ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.2.fastq ): 847
Running trf …
Running trf …
Running trf …
Running trf …
Decontaminating …
Running bowtie2 …
Total reads after removing those found in reference database ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_paired_clean_1.fastq ): 0
Total reads after removing those found in reference database ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_paired_clean_2.fastq ): 0
Total reads after merging results from multiple databases ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_paired_1.fastq ): 0
Total reads after merging results from multiple databases ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_paired_2.fastq ): 0
Total reads after removing those found in reference database ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_unmatched_1_clean.fastq ): 35341
Total reads after merging results from multiple databases ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_unmatched_1.fastq ): 35341
Total reads after removing those found in reference database ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_unmatched_2_clean.fastq ): 0
Total reads after merging results from multiple databases ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_unmatched_2.fastq ): 0

Final output files created:
/gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_paired_1.fastq
/gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_paired_2.fastq
/gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_unmatched_1.fastq
/gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_unmatched_2.fastq
bash$ wc kneaddataOutputPairedEnd/*
0 0 0 kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_paired_contam_1.fastq
0 0 0 kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_paired_contam_2.fastq
0 0 0 kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_unmatched_1_contam.fastq
0 0 0 kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_unmatched_2_contam.fastq
138 1354 17765 kneaddataOutputPairedEnd/seq1_kneaddata.log
0 0 0 kneaddataOutputPairedEnd/seq1_kneaddata_paired_1.fastq
0 0 0 kneaddataOutputPairedEnd/seq1_kneaddata_paired_2.fastq
141364 141364 13333610 kneaddataOutputPairedEnd/seq1_kneaddata.repeats.removed.1.fastq
141364 141364 13302786 kneaddataOutputPairedEnd/seq1_kneaddata.repeats.removed.2.fastq
21540 21540 1899203 kneaddataOutputPairedEnd/seq1_kneaddata.repeats.removed.unmatched.1.fastq
3384 3384 289698 kneaddataOutputPairedEnd/seq1_kneaddata.repeats.removed.unmatched.2.fastq
141364 141364 13333610 kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.1.fastq
141364 141364 13302786 kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.2.fastq
21540 21540 1899203 kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.1.fastq
3388 3388 290085 kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.2.fastq
141364 141364 13333610 kneaddataOutputPairedEnd/seq1_kneaddata_unmatched_1.fastq
0 0 0 kneaddataOutputPairedEnd/seq1_kneaddata_unmatched_2.fastq
756810 758026 71002356 total

Here is the logfile:

bash$ cat kneaddataOutputPairedEnd/seq1_kneaddata.log

09/18/2020 10:08:37 AM - kneaddata.knead_data - INFO: Running kneaddata v0.7.10

09/18/2020 10:08:37 AM - kneaddata.knead_data - INFO: Output files will be written to: /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd

09/18/2020 10:08:37 AM - kneaddata.knead_data - DEBUG: Running with the following arguments:

verbose = False

bypass_trf = False

bmtagger_path = None

minscore = 50

bowtie2_path = /gfs/apps/bio/bowtie2-2.3.0/bowtie2

maxperiod = 500

no_discordant = False

serial = False

fastqc_start = False

bmtagger = False

cat_final_output = False

log_level = DEBUG

log = /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.log

sequencer_source = NexteraPE

max_memory = 500m

remove_intermediate_output = False

fastqc_path = None

output_dir = /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd

trf_path = /gfs/archive/jostins/microbiome/software/trf/trf

remove_temp_output = True

reference_db = /gfs/archive/jostins/microbiome/kneadata/demo/input/demo_db

input = /gfs/archive/jostins/microbiome/kneadata/demo/input/seq1.fastq /gfs/archive/jostins/microbiome/kneadata/demo/input/seq2.fastq

pi = 10

reorder = False

pm = 80

trimmomatic_path = /gfs/apps/bio/trimmomatic-0.35/trimmomatic-0.35.jar

store_temp_output = False

mismatch = 7

threads = 1

delta = 7

bowtie2_options = --very-sensitive --phred33

bypass_trim = False

processes = 1

trimmomatic_quality_scores = -phred33

fastqc_end = False

trimmomatic_options = None

output_prefix = seq1_kneaddata

match = 2

09/18/2020 10:08:37 AM - kneaddata.utilities - INFO: READ COUNT: raw pair1 : Initial number of reads ( /gfs/archive/jostins/microbiome/kneadata/demo/input/seq1.fastq ): 42473

09/18/2020 10:08:37 AM - kneaddata.utilities - INFO: READ COUNT: raw pair2 : Initial number of reads ( /gfs/archive/jostins/microbiome/kneadata/demo/input/seq2.fastq ): 42473

09/18/2020 10:08:37 AM - kneaddata.utilities - DEBUG: Checking input file to Trimmomatic : /gfs/archive/jostins/microbiome/kneadata/demo/input/seq1.fastq

09/18/2020 10:08:37 AM - kneaddata.utilities - DEBUG: Checking input file to Trimmomatic : /gfs/archive/jostins/microbiome/kneadata/demo/input/seq2.fastq

09/18/2020 10:08:37 AM - kneaddata.utilities - INFO: Running Trimmomatic …

09/18/2020 10:08:37 AM - kneaddata.utilities - INFO: Execute command: java -Xmx500m -jar /gfs/apps/bio/trimmomatic-0.35/trimmomatic-0.35.jar PE -threads 1 -phred33 /gfs/archive/jostins/microbiome/kneadata/demo/input/seq1.fastq /gfs/archive/jostins/microbiome/kneadata/demo/input/seq2.fastq /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.1.fastq /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.1.fastq /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.2.fastq /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.2.fastq ILLUMINACLIP:/gfs/devel/ljostins/python-venv-2.7.13/lib/python2.7/site-packages/kneaddata/adapters/NexteraPE-PE.fa:2:30:10:8:TRUE SLIDINGWINDOW:4:20 MINLEN:87

09/18/2020 10:08:38 AM - kneaddata.utilities - DEBUG: TrimmomaticPE: Started with arguments:

-threads 1 -phred33 /gfs/archive/jostins/microbiome/kneadata/demo/input/seq1.fastq /gfs/archive/jostins/microbiome/kneadata/demo/input/seq2.fastq /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.1.fastq /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.1.fastq /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.2.fastq /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.2.fastq ILLUMINACLIP:/gfs/devel/ljostins/python-venv-2.7.13/lib/python2.7/site-packages/kneaddata/adapters/NexteraPE-PE.fa:2:30:10:8:TRUE SLIDINGWINDOW:4:20 MINLEN:87

Using PrefixPair: ‘AGATGTGTATAAGAGACAG’ and ‘AGATGTGTATAAGAGACAG’

Using Long Clipping Sequence: ‘GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG’

Using Long Clipping Sequence: ‘TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG’

Using Long Clipping Sequence: ‘CTGTCTCTTATACACATCTCCGAGCCCACGAGAC’

Using Long Clipping Sequence: ‘CTGTCTCTTATACACATCTGACGCTGCCGACGA’

ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences

Input Read Pairs: 42473 Both Surviving: 35341 (83.21%) Forward Only Surviving: 5385 (12.68%) Reverse Only Surviving: 847 (1.99%) Dropped: 900 (2.12%)

TrimmomaticPE: Completed successfully

09/18/2020 10:08:38 AM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.1.fastq

09/18/2020 10:08:38 AM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.1.fastq

09/18/2020 10:08:38 AM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.2.fastq

09/18/2020 10:08:38 AM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.2.fastq

09/18/2020 10:08:39 AM - kneaddata.utilities - INFO: READ COUNT: trimmed pair1 : Total reads after trimming ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.1.fastq ): 35341

09/18/2020 10:08:39 AM - kneaddata.utilities - INFO: READ COUNT: trimmed pair2 : Total reads after trimming ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.2.fastq ): 35341

09/18/2020 10:08:39 AM - kneaddata.utilities - INFO: READ COUNT: trimmed orphan1 : Total reads after trimming ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.1.fastq ): 5385

09/18/2020 10:08:39 AM - kneaddata.utilities - INFO: READ COUNT: trimmed orphan2 : Total reads after trimming ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.2.fastq ): 847

09/18/2020 10:08:40 AM - kneaddata.utilities - DEBUG: Checking input file to trf : /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.1.fasta

09/18/2020 10:08:40 AM - kneaddata.utilities - INFO: Running trf …

09/18/2020 10:08:40 AM - kneaddata.utilities - INFO: Execute command: /gfs/archive/jostins/microbiome/software/trf/trf /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.1.fasta 2 7 7 80 10 50 500 -h -ngs

09/18/2020 10:08:43 AM - kneaddata.utilities - DEBUG: 0

09/18/2020 10:08:43 AM - kneaddata.utilities - DEBUG: Checking output file from trf : /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat

09/18/2020 10:08:43 AM - kneaddata.utilities - DEBUG: Checking input file to trf : /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.2.fasta

09/18/2020 10:08:43 AM - kneaddata.utilities - INFO: Running trf …

09/18/2020 10:08:43 AM - kneaddata.utilities - INFO: Execute command: /gfs/archive/jostins/microbiome/software/trf/trf /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.2.fasta 2 7 7 80 10 50 500 -h -ngs

09/18/2020 10:08:46 AM - kneaddata.utilities - DEBUG: 0

09/18/2020 10:08:46 AM - kneaddata.utilities - DEBUG: Checking output file from trf : /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat

09/18/2020 10:08:47 AM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.1.fastq ): 0

09/18/2020 10:08:47 AM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.2.fastq ): 0

09/18/2020 10:08:47 AM - kneaddata.utilities - DEBUG: Checking input file to trf : /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.1.fasta

09/18/2020 10:08:47 AM - kneaddata.utilities - INFO: Running trf …

09/18/2020 10:08:47 AM - kneaddata.utilities - INFO: Execute command: /gfs/archive/jostins/microbiome/software/trf/trf /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.1.fasta 2 7 7 80 10 50 500 -h -ngs

09/18/2020 10:08:47 AM - kneaddata.utilities - DEBUG: 0

09/18/2020 10:08:47 AM - kneaddata.utilities - DEBUG: Checking output file from trf : /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat

09/18/2020 10:08:47 AM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.1.fastq ): 0

09/18/2020 10:08:47 AM - kneaddata.utilities - DEBUG: Checking input file to trf : /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.2.fasta

09/18/2020 10:08:47 AM - kneaddata.utilities - INFO: Running trf …

09/18/2020 10:08:47 AM - kneaddata.utilities - INFO: Execute command: /gfs/archive/jostins/microbiome/software/trf/trf /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.2.fasta 2 7 7 80 10 50 500 -h -ngs

09/18/2020 10:08:48 AM - kneaddata.utilities - DEBUG: 0

09/18/2020 10:08:48 AM - kneaddata.utilities - DEBUG: Checking output file from trf : /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat

09/18/2020 10:08:48 AM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.2.fastq ): 1

09/18/2020 10:08:48 AM - kneaddata.run - INFO: Decontaminating …

09/18/2020 10:08:48 AM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.repeats.removed.1.fastq

09/18/2020 10:08:48 AM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.repeats.removed.2.fastq

09/18/2020 10:08:48 AM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.repeats.removed.unmatched.1.fastq

09/18/2020 10:08:48 AM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.repeats.removed.unmatched.2.fastq

09/18/2020 10:08:48 AM - kneaddata.utilities - INFO: Running bowtie2 …

09/18/2020 10:08:48 AM - kneaddata.utilities - INFO: Execute command: kneaddata_bowtie2_discordant_pairs --bowtie2 /gfs/apps/bio/bowtie2-2.3.0/bowtie2 --threads 1 -x /gfs/archive/jostins/microbiome/kneadata/demo/input/demo_db --bowtie2-options “–very-sensitive --phred33” -1 /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.repeats.removed.1.fastq -2 /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.repeats.removed.2.fastq --un-pair /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_paired_clean_%.fastq --al-pair /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_paired_contam_%.fastq -U /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.repeats.removed.unmatched.1.fastq,/gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.repeats.removed.unmatched.2.fastq --un-single /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_unmatched_%clean.fastq --al-single /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_unmatched%_contam.fastq -S /dev/null

09/18/2020 10:08:49 AM - kneaddata.utilities - DEBUG: 35341 reads; of these:

35341 (100.00%) were unpaired; of these:

35341 (100.00%) aligned 0 times

0 (0.00%) aligned exactly 1 time

0 (0.00%) aligned >1 times

0.00% overall alignment rate

pair1_aligned : 0

pair2_aligned : 0

orphan1_unaligned : 35341

orphan2_unaligned : 0

orphan2_aligned : 0

pair2_unaligned : 0

pair1_unaligned : 0

orphan1_aligned : 0

09/18/2020 10:08:49 AM - kneaddata.utilities - DEBUG: Checking output file from bowtie2 : /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_paired_clean_1.fastq

09/18/2020 10:08:49 AM - kneaddata.utilities - DEBUG: Checking output file from bowtie2 : /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_paired_clean_2.fastq

09/18/2020 10:08:49 AM - kneaddata.run - INFO: Total contaminate sequences in file ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_paired_contam_1.fastq ) : 0

09/18/2020 10:08:49 AM - kneaddata.run - INFO: Total contaminate sequences in file ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_paired_contam_2.fastq ) : 0

09/18/2020 10:08:49 AM - kneaddata.run - INFO: Total contaminate sequences in file ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_unmatched_1_contam.fastq ) : 0

09/18/2020 10:08:49 AM - kneaddata.run - INFO: Total contaminate sequences in file ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_unmatched_2_contam.fastq ) : 0

09/18/2020 10:08:49 AM - kneaddata.utilities - INFO: READ COUNT: decontaminated demo_db pair1 : Total reads after removing those found in reference database ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_paired_clean_1.fastq ): 0

09/18/2020 10:08:49 AM - kneaddata.utilities - INFO: READ COUNT: decontaminated demo_db pair2 : Total reads after removing those found in reference database ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_paired_clean_2.fastq ): 0

09/18/2020 10:08:49 AM - kneaddata.utilities - INFO: READ COUNT: final pair1 : Total reads after merging results from multiple databases ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_paired_1.fastq ): 0

09/18/2020 10:08:49 AM - kneaddata.utilities - INFO: READ COUNT: final pair2 : Total reads after merging results from multiple databases ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_paired_2.fastq ): 0

09/18/2020 10:08:49 AM - kneaddata.utilities - WARNING: Unable to remove file: /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_paired_clean_1.fastq

09/18/2020 10:08:49 AM - kneaddata.utilities - WARNING: Unable to remove file: /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_paired_clean_2.fastq

09/18/2020 10:08:49 AM - kneaddata.utilities - INFO: READ COUNT: decontaminated demo_db orphan1 : Total reads after removing those found in reference database ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_unmatched_1_clean.fastq ): 35341

09/18/2020 10:08:49 AM - kneaddata.utilities - INFO: READ COUNT: final orphan1 : Total reads after merging results from multiple databases ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_unmatched_1.fastq ): 35341

09/18/2020 10:08:49 AM - kneaddata.utilities - WARNING: Unable to remove file: /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_unmatched_1_clean.fastq

09/18/2020 10:08:49 AM - kneaddata.utilities - INFO: READ COUNT: decontaminated demo_db orphan2 : Total reads after removing those found in reference database ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_unmatched_2_clean.fastq ): 0

09/18/2020 10:08:49 AM - kneaddata.utilities - INFO: READ COUNT: final orphan2 : Total reads after merging results from multiple databases ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_unmatched_2.fastq ): 0

09/18/2020 10:08:49 AM - kneaddata.utilities - WARNING: Unable to remove file: /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_unmatched_2_clean.fastq

09/18/2020 10:08:49 AM - kneaddata.knead_data - INFO:

Final output files created:

/gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_paired_1.fastq

/gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_paired_2.fastq

/gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_unmatched_1.fastq

/gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_unmatched_2.fastq

Oh, I think the issue here might be with bowtie2. If update from 2.3.0 to 2.4.1 it seems to work:

bash$ kneaddata --input input/seq1.fastq --input input/seq2.fastq --reference-db input/demo_db --output kneaddataOutputPairedEnd --trf …/…/software/trf/ --bowtie2 /gfs/archive/jostins/microbiome/software/bowtie2-2.4.1-linux-x86_64/
$ kneaddata --input input/seq1.fastq --input input/seq2.fastq --reference-db input/demo_db --output kneaddataOutputPairedEnd --trf …/…/software/trf/ --bowtie2 /gfs/archive/jostins/microbiome/software/bowtie2-2.4.1-linux-x86_64/

Initial number of reads ( /gfs/archive/jostins/microbiome/kneadata/demo/input/seq1.fastq ): 42473

Initial number of reads ( /gfs/archive/jostins/microbiome/kneadata/demo/input/seq2.fastq ): 42473

Running Trimmomatic …

Total reads after trimming ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.1.fastq ): 35341

Total reads after trimming ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.2.fastq ): 35341

Total reads after trimming ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.1.fastq ): 5385

Total reads after trimming ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.2.fastq ): 847

Running trf …

Running trf …

Running trf …

Running trf …

Decontaminating …

Running bowtie2 …

Total reads after removing those found in reference database ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_paired_clean_1.fastq ): 35341

Total reads after removing those found in reference database ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_paired_clean_2.fastq ): 35341

Total reads after merging results from multiple databases ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_paired_1.fastq ): 35341

Total reads after merging results from multiple databases ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_paired_2.fastq ): 35341

Total reads after removing those found in reference database ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_unmatched_1_clean.fastq ): 5385

Total reads after merging results from multiple databases ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_unmatched_1.fastq ): 5385

Total reads after removing those found in reference database ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_unmatched_2_clean.fastq ): 846

Total reads after merging results from multiple databases ( /gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_unmatched_2.fastq ): 846

Final output files created:

/gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_paired_1.fastq

/gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_paired_2.fastq

/gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_unmatched_1.fastq

/gfs/archive/jostins/microbiome/kneadata/demo/kneaddataOutputPairedEnd/seq1_kneaddata_unmatched_2.fastq
bash$ wc kneaddataOutputPairedEnd/*
0 0 0 kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_paired_contam_1.fastq
0 0 0 kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_paired_contam_2.fastq
0 0 0 kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_unmatched_1_contam.fastq
0 0 0 kneaddataOutputPairedEnd/seq1_kneaddata_demo_db_bowtie2_unmatched_2_contam.fastq
138 1354 17872 kneaddataOutputPairedEnd/seq1_kneaddata.log
141364 141364 13333610 kneaddataOutputPairedEnd/seq1_kneaddata_paired_1.fastq
141364 141364 13302786 kneaddataOutputPairedEnd/seq1_kneaddata_paired_2.fastq
141364 141364 13333610 kneaddataOutputPairedEnd/seq1_kneaddata.repeats.removed.1.fastq
141364 141364 13302786 kneaddataOutputPairedEnd/seq1_kneaddata.repeats.removed.2.fastq
21540 21540 1899203 kneaddataOutputPairedEnd/seq1_kneaddata.repeats.removed.unmatched.1.fastq
3384 3384 289698 kneaddataOutputPairedEnd/seq1_kneaddata.repeats.removed.unmatched.2.fastq
141364 141364 13333610 kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.1.fastq
141364 141364 13302786 kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.2.fastq
21540 21540 1899203 kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.1.fastq
3388 3388 290085 kneaddataOutputPairedEnd/seq1_kneaddata.trimmed.single.2.fastq
21540 21540 1899203 kneaddataOutputPairedEnd/seq1_kneaddata_unmatched_1.fastq
3384 3384 289698 kneaddataOutputPairedEnd/seq1_kneaddata_unmatched_2.fastq
923098 924314 86494150 total

Hi - That is great you have it figured out! Thanks for the follow up post. We will make a note of the bowtie2 version on our end.

Thank you,
Lauren