Paired-end data results in unpaired output

Hi,

After running Kneaddata with Bowtie2 on paired-end data, the output I’m getting from the final output seems to be unpaired (the first read has over 9x the amount of reads as the second). I’m curious to know if there’s a way to force paired-end reads in the analysis and throw out any reads which are unpaired.

The command I ran was the following:
kneaddata --input sample_1.fastq --input sample_2.fastq --output /path/to/mydir --bypass-trim --run-trf -db /kneaddataGenome/SILVA_128_LSUParc_SSUParc_ribosomal_RNA

Thanks!

Hi, Thanks for the post. Kneaddata should by default track read pairs if a pair of input files are provided. You should see pair output files (with the same number of reads) and orphan files. I think in your case kneaddata is possibly having an issue tracking the pairs due to sequence identifiers of an unexpected format. Can you check to see if there are spaces in the sequence ids or possibly they are missing the read number?

On our end we will work on updating kneaddata to catch the case where sequence ids of an unexpected format are provided and throw an informative error message. Sorry for the confusion.

Thank you,
Lauren

Hi Lauren,

Thank you so much for you reply. I changed the sequence identifiers and sure enough, that solved the issue. Thanks again!

Best,
Kat

@lauren.j.mciver I have a follow up question to this thread. Kneaddata v.0.7.4 is not identifying the paired ends when run in the biobakery_workflow v.3.0.0-alpha.7. My file names are “ABCD_S70_R1.fastq.gz”, so I added the command line options --pair-identifier “_R1” since it does not follow the “.R1” format in the user guide on github.

My sequence identifiers in the fastq look like this: @EAS139:136:FC706VJ:2:2104:15343:197393 1:N:18:ATCACG. Would the space between the first bit and the 1:N... cause the workflow to miss the pair?

The other oddity is that when I run kneaddata on the same data files outside of the workflow and without the --pair-identifier flag, it runs successfully and correctly merges R1/R2 into a single fastq.

Do you have any guidance? I’m not sure how to troubleshoot given that kneaddata independently runs correctly, but fails to merge the pairs when run as part of the workflow.

Hi @ewissel, Thank you for the detailed post. I just checked the Kneaddata code for the latest version and it should catch your sequence identifier format with the space plus “:1”. The --pair-identifier flag is just used for the bioBakery workflows so the tool can pick up the paired files to pass on to Kneaddata. If you don’t use that flag and the identifier does not match the default then the workflow will process the reads as single end. Have you tried updating to the latest version of Kneaddata v0.10.0?

Thank you,
Lauren

Thanks, Lauren! Maybe the kneaddata version is the issue. I download the workflow via conda, and the current version of kneaddata on conda is 0.7.4. Is it possible to update the conda install to v0.10.0?

Hi @ewissel , I pushed the latest version of kneaddata to conda. Please try it out and let me know if it resolves the issue you are seeing.

Thank you,
Lauren

Thanks, Lauren!

When I run conda update -c biobakery kneaddata, I still get v0.7.4. I know that bioconda and biobakery channels have kneaddata, but I thought that directing to the biobakery channel would ensure conda looks there to update kneaddata.

Do you have any advice on this?

Hi Emily, I think conda will pick the best version based on your current environment which is not always the latest version. Can you try adding the specific version into your command kneaddata=0.10.0 and see if that will get the latest one?

Thanks!
Lauren

Hey Lauren,

I was successfully able to update kneaddata with conda install -c biobakery kneaddata=0.10.0 (looks like conda upgrade doesn’t like version info). However, this did not resolve kneaddata not matching paired end files properly.

My files are name “sampleID_SX_R1_001.fastq.gz”, and for the workflow I have tried the following pair identifier arguments:

  • “_R1”
  • “_R1_001”
  • “_R1_”
  • “R1_”
  • “R1_001”

Is my identifier argument wrong? Should I be doing something differently for anadama2/the workflow?

Also, I am using biobakery_workflows v3.0.0-alpha.7, which is the latest on conda. Should I be using a different version?

Hi Emily, Thank you for the follow up. I am glad you were able to update to the latest kneaddata version. I think any of those pair identifiers should work with the workflow. If you could send me (feel free to send it directly to me) your log file I can dig in a bit more detail to see what might be going on.

Thank you,
Lauren

Hi lauren I will continue this problem

First I use kneaddata 0.7.2 it give me unpaired reads

singularity --debug exec
  /software/kneaddata_0.7.2.sif kneaddata \
  --remove-intermediate-output --threads 4 --bypass-trim \
  --input ../raw_data/SRR527911_1.fastq.gz --input ../raw_data/SRR527911_2.fastq.gz \
  --output ../temp/01_kneaddata --reference-db ../databases/kneaddata/human_genome \
  --bowtie2-options "--very-sensitive --dovetail"

paired 1 737M and paired 2 82M are not same

-rw-r--r-- 1 ckzhu sample_lib    0 Jul 17 09:07 SRR527911_1_kneaddata_unmatched_2.fastq
-rw-r--r-- 1 ckzhu sample_lib    0 Jul 17 09:07 SRR527911_1_kneaddata_Homo_sapiens_bowtie2_unmatched_1_contam.fastq
-rw-r--r-- 1 ckzhu sample_lib  13K Jul 17 09:07 SRR527911_1_kneaddata_unmatched_1.fastq
-rw-r--r-- 1 ckzhu sample_lib  82M Jul 17 09:07 SRR527911_1_kneaddata_paired_2.fastq
-rw-r--r-- 1 ckzhu sample_lib 737M Jul 17 09:07 SRR527911_1_kneaddata_paired_1.fastq
-rw-r--r-- 1 ckzhu sample_lib 1.4K Jul 17 09:07 SRR527911_1_kneaddata_Homo_sapiens_bowtie2_unmatched_2_contam.fastq
-rw-r--r-- 1 ckzhu sample_lib  448 Jul 17 09:07 SRR527911_1_kneaddata_Homo_sapiens_bowtie2_paired_contam_2.fastq
-rw-r--r-- 1 ckzhu sample_lib 4.0K Jul 17 09:07 SRR527911_1_kneaddata_Homo_sapiens_bowtie2_paired_contam_1.fastq
-rw-r--r-- 1 ckzhu sample_lib  11K Jul 17 09:07 SRR527911_1_kneaddata.log

Then I use kneaddata 0.10.0 from docker images
I get new error

singularity --debug exec
  /software/kneaddata_0.10.0.sif kneaddata \
  --remove-intermediate-output --threads 4 --bypass-trim \
  --input ../raw_data/SRR527911_1.fastq.gz --input ../raw_data/SRR527911_2.fastq.gz \
  --output ../temp/01_kneaddata --reference-db ../databases/kneaddata/human_genome \
  --bowtie2-options "--very-sensitive --dovetail"
Decompressing gzipped file ...

Decompressing gzipped file ...

Reformatting file sequence identifiers ...

Reformatting file sequence identifiers ...

Initial number of reads ( /public/home/sample_lib/ckzhu/software/Snakemake_singularity/test/pipeline/temp/01_kneaddata/reformatted_identifiers8t6864hl_decompressed_zex5i232_SRR527911_1 ): 1921490.0
Initial number of reads ( /public/home/sample_lib/ckzhu/software/Snakemake_singularity/test/pipeline/temp/01_kneaddata/reformatted_identifiersrphkmryn_decompressed_gf1ann04_SRR527911_2 ): 1921490.0
Bypass trimming
Total reads after trimming ( /public/home/sample_lib/ckzhu/software/Snakemake_singularity/test/pipeline/temp/01_kneaddata/reformatted_identifiers8t6864hl_decompressed_zex5i232_SRR527911_1 ): 1921490.0
Total reads after trimming ( /public/home/sample_lib/ckzhu/software/Snakemake_singularity/test/pipeline/temp/01_kneaddata/reformatted_identifiersrphkmryn_decompressed_gf1ann04_SRR527911_2 ): 1921490.0
ERROR: Unable to write file: /public/home/sample_lib/ckzhu/software/Snakemake_singularity/test/pipeline/temp/01_kneaddata/reformatted_identifiers8t6864hl_decompressed_zex5i232_SRR527911_1
DEBUG   [U=5169,P=28402]   Master()                      Child exited with exit status 1

why I can’t write tmp file in that dir? should root to run singularity?
If I use conda ,I also can’t write

conda create -n kneaddata kneaddata=0.10.0

source activate kneaddata
kneaddata \
  --remove-intermediate-output --threads 4 --bypass-trim \ 
  --input ../raw_data/SRR527911_1.fastq.gz --input ../raw_data/SRR527911_2.fastq.gz \
  --output ../temp/01_kneaddata --reference-db ../databases/kneaddata/human_genome \
  --bowtie2-options "--very-sensitive --dovetail"



Decompressing gzipped file ...

Decompressing gzipped file ...

Reformatting file sequence identifiers ...

Reformatting file sequence identifiers ...

Initial number of reads ( /public/home/sample_lib/ckzhu/software/Snakemake_singularity/test/pipeline/temp/01_kneaddata/reformatted_identifiersjsysc7lb_decompressed_sff5u2hr_SRR527911_1 ): 1921490.0
Initial number of reads ( /public/home/sample_lib/ckzhu/software/Snakemake_singularity/test/pipeline/temp/01_kneaddata/reformatted_identifiersf3j5gvzj_decompressed_hab_axlq_SRR527911_2 ): 1921490.0
Bypass trimming
Total reads after trimming ( /public/home/sample_lib/ckzhu/software/Snakemake_singularity/test/pipeline/temp/01_kneaddata/reformatted_identifiersjsysc7lb_decompressed_sff5u2hr_SRR527911_1 ): 1921490.0
Total reads after trimming ( /public/home/sample_lib/ckzhu/software/Snakemake_singularity/test/pipeline/temp/01_kneaddata/reformatted_identifiersf3j5gvzj_decompressed_hab_axlq_SRR527911_2 ): 1921490.0
ERROR: Unable to write file: /public/home/sample_lib/ckzhu/software/Snakemake_singularity/test/pipeline/temp/01_kneaddata/reformatted_identifiersjsysc7lb_decompressed_sff5u2hr_SRR527911_1

I answered this elsewhere on the forum, but tldr: The sequence names apparently must have something like .R1. or .R2. as the input file name.

Hello, Thank you for the detailed post and sorry for the slow response. I don’t think you need to run as root with singularity. It looks like it writes a couple of files before it fails. Is it possible you are running out of disk space? Kneaddata can use a bit of disk space (up to 4x the original input size) if if needs to decompress input files and reformat the sequence identifiers. I think in the first case with the older kneaddata version it is likely having an issue tracking the pairs which should be fixed in the newer version.

Thank you,
Lauren

Hi Lauren,
I have trouble in generating paired end output.
The command I ran was:
kneaddata --input1 10co_S53.R1.fastq.gz --input2 10co_S53.R2.fastq.gz --reference-db rat --output kneaddata_out --trimmomatic-options=“SLIDINGWINDOW:4:20 MINLEN:95”

And the outputs are

I am not sure if this is a sequence identifiers issue or something else.

The kneadata version is kneaddata v0.12.0

I can also send you my log file if that helps with answering my question.

Thank you!!!

This is the log file.
02/22/2023 11:23:51 PM - kneaddata.knead_data - INFO: Running kneaddata v0.12.0
02/22/2023 11:23:51 PM - kneaddata.knead_data - INFO: Output files will be written to: /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out
02/22/2023 11:23:51 PM - kneaddata.knead_data - DEBUG: Running with the following arguments:
input2 = 10co_S53.R2.fastq.gz
input1 = 10co_S53.R1.fastq.gz
verbose = False
bypass_trf = False
bmtagger_path = None
minscore = 50
bowtie2_path = /panfs/roc/msisoft/bowtie2/2.4.4.gnu7.2.0/bin/bowtie2
maxperiod = 500
discordant = True
serial = True
fastqc_start = False
store_temp_output = False
cat_final_output = False
log_level = DEBUG
log = /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.log
sequencer_source = NexteraPE
max_memory = 500m
remove_intermediate_output = False
fastqc_path = None
output_dir = /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out
trf_path = /panfs/roc/msisoft/trf/407b_64/trf
remove_temp_output = True
reference_db = /panfs/jay/groups/29/gallaher/jiang329/practice/rat
input = /panfs/jay/groups/29/gallaher/jiang329/practice/10co_S53.R1.fastq.gz /panfs/jay/groups/29/gallaher/jiang329/practice/10co_S53.R2.fastq.gz
decontaminate_pairs = strict
reorder = False
pm = 80
trimmomatic_path = /panfs/roc/msisoft/trimmomatic/0.33/trimmomatic.jar
run_trf = False
mismatch = 7
threads = 1
delta = 7
bowtie2_options = --very-sensitive-local --phred33
bypass_trim = False
processes = 1
pi = 10
trimmomatic_quality_scores = -phred33
fastqc_end = False
scratch_dir =
trimmomatic_options = SLIDINGWINDOW:4:20 MINLEN:95
output_prefix = 10co_S53.R1_kneaddata
match = 2
bmtagger = False
run_trim_repetitive = False
unpaired = None

02/22/2023 11:23:51 PM - kneaddata.utilities - INFO: Decompressing gzipped file …
02/22/2023 11:24:05 PM - kneaddata.utilities - INFO: Decompressed file created: /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/decompressed_yq9HK7_10co_S53.R1.fastq
02/22/2023 11:24:05 PM - kneaddata.utilities - INFO: Decompressing gzipped file …
02/22/2023 11:24:18 PM - kneaddata.utilities - INFO: Decompressed file created: /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/decompressed_Y9Uj6X_10co_S53.R2.fastq
02/22/2023 11:24:18 PM - kneaddata.utilities - INFO: Reformatting file sequence identifiers …
02/22/2023 11:24:33 PM - kneaddata.utilities - INFO: Reformatting file sequence identifiers …
02/22/2023 11:24:51 PM - kneaddata.utilities - INFO: READ COUNT: raw pair1 : Initial number of reads ( /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/reformatted_identifiersqzSwGC_decompressed_yq9HK7_10co_S53.R1 ): 7072656
02/22/2023 11:24:54 PM - kneaddata.utilities - INFO: READ COUNT: raw pair2 : Initial number of reads ( /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/reformatted_identifierswXuDF5_decompressed_Y9Uj6X_10co_S53.R2 ): 7072656
02/22/2023 11:24:54 PM - kneaddata.utilities - DEBUG: Checking input file to Trimmomatic : /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/reformatted_identifiersqzSwGC_decompressed_yq9HK7_10co_S53.R1
02/22/2023 11:24:54 PM - kneaddata.utilities - DEBUG: Checking input file to Trimmomatic : /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/reformatted_identifierswXuDF5_decompressed_Y9Uj6X_10co_S53.R2
02/22/2023 11:24:54 PM - kneaddata.utilities - INFO: Running Trimmomatic …
02/22/2023 11:24:54 PM - kneaddata.utilities - INFO: Execute command: java -Xmx500m -jar /panfs/roc/msisoft/trimmomatic/0.33/trimmomatic.jar PE -threads 1 -phred33 /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/reformatted_identifiersqzSwGC_decompressed_yq9HK7_10co_S53.R1 /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/reformatted_identifierswXuDF5_decompressed_Y9Uj6X_10co_S53.R2 /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.1.fastq /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.single.1.fastq /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.2.fastq /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.single.2.fastq SLIDINGWINDOW:4:20 MINLEN:95
02/22/2023 11:26:09 PM - kneaddata.utilities - DEBUG: TrimmomaticPE: Started with arguments: -threads 1 -phred33 /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/reformatted_identifiersqzSwGC_decompressed_yq9HK7_10co_S53.R1 /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/reformatted_identifierswXuDF5_decompressed_Y9Uj6X_10co_S53.R2 /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.1.fastq /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.single.1.fastq /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.2.fastq /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.single.2.fastq SLIDINGWINDOW:4:20 MINLEN:95
Input Read Pairs: 7072656 Both Surviving: 6269087 (88.64%) Forward Only Surviving: 359281 (5.08%) Reverse Only Surviving: 268073 (3.79%) Dropped: 176215 (2.49%)
TrimmomaticPE: Completed successfully

02/22/2023 11:26:09 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.1.fastq
02/22/2023 11:26:09 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.single.1.fastq
02/22/2023 11:26:09 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.2.fastq
02/22/2023 11:26:09 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.single.2.fastq
02/22/2023 11:26:12 PM - kneaddata.utilities - INFO: READ COUNT: trimmed pair1 : Total reads after trimming ( /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.1.fastq ): 6269087
02/22/2023 11:26:14 PM - kneaddata.utilities - INFO: READ COUNT: trimmed pair2 : Total reads after trimming ( /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.2.fastq ): 6269087
02/22/2023 11:26:15 PM - kneaddata.utilities - INFO: READ COUNT: trimmed orphan1 : Total reads after trimming ( /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.single.1.fastq ): 359281
02/22/2023 11:26:15 PM - kneaddata.utilities - INFO: READ COUNT: trimmed orphan2 : Total reads after trimming ( /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.single.2.fastq ): 268073
02/22/2023 11:28:35 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.1.fasta
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Running trf …
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.1.fasta --output /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /panfs/roc/msisoft/trf/407b_64/trf --trf-options ‘2 7 7 80 10 50 500 -h -ngs’ --nproc 1
02/22/2023 11:28:35 PM - kneaddata.utilities - CRITICAL: Error executing: kneaddata_trf_parallel --input /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.1.fasta --output /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /panfs/roc/msisoft/trf/407b_64/trf --trf-options ‘2 7 7 80 10 50 500 -h -ngs’ --nproc 1

02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Total memory = 503.452335358 GB
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Available memory = 369.412334442 GB
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Free memory = 53.7003898621 GB
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Percent memory used = 26.6 %
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: CPU percent = 38.5 %
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Total cores count = 128
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Total disk = 1.990234375 GB
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Used disk = 0.234977722168 GB
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Percent disk used = 11.8 %
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process create time = 2023-02-22 23:28:34
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process user time = 0.01 seconds
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process system time = 0.0 seconds
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process CPU percent = 0.0 %
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process memory RSS = 0.010383605957 GB
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process memory VMS = 0.113941192627 GB
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process memory percent = 0.0020700575204 %
02/22/2023 11:28:35 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.2.fasta
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Running trf …
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.2.fasta --output /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /panfs/roc/msisoft/trf/407b_64/trf --trf-options ‘2 7 7 80 10 50 500 -h -ngs’ --nproc 1
02/22/2023 11:28:35 PM - kneaddata.utilities - CRITICAL: Error executing: kneaddata_trf_parallel --input /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.2.fasta --output /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /panfs/roc/msisoft/trf/407b_64/trf --trf-options ‘2 7 7 80 10 50 500 -h -ngs’ --nproc 1

02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Total memory = 503.452335358 GB
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Available memory = 369.412334442 GB
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Free memory = 53.7003898621 GB
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Percent memory used = 26.6 %
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: CPU percent = 37.6 %
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Total cores count = 128
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Total disk = 1.990234375 GB
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Used disk = 0.234977722168 GB
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Percent disk used = 11.8 %
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process create time = 2023-02-22 23:28:34
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process user time = 0.01 seconds
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process system time = 0.0 seconds
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process CPU percent = 0.0 %
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process memory RSS = 0.0104522705078 GB
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process memory VMS = 0.113945007324 GB
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process memory percent = 0.00207611918224 %

Thanks for the detailed error post. Kneadata only expects periods for file extensions so it is getting a bit confused in naming the output file with the “.R1” and “.R2”. If you would replace the “.” with a “_” in the file names that should hopefully fix the issue with the output file name.

Thank you,
Lauren

Hello,
I’m sorry to revive this topic again. I have also stumbled upon a problem, that all my paired reads land in unpaired folder.
My samples are murine samples, which were sequenced on DNBseq platform, so I thought it could be the problem of sequence headers, but until now I could not solve the problem by different modifications of sequence headers. However, when I run the samples separately first using trimmomatic and then bowtie2 with the same reference database, it seems like it is working and bowtie2 doesn’t complain on the headers.
Here is the command I’m running:

kneaddata --input1 S1_R1.fq.gz --input2 S1_R2.fq.gz --threads 10 --trimmomatic ~/software/Trimmomatic-0.33/dist/jar/ -db /home/yask/reference_data/knead_database/mouse --output knead_output

here are the examples of sequence headers:
R1: @V350094545L2C001R0020000295:0:0:0:0 1:N:0:GCGATCTA_TCGCCTTA
R2: @V350094545L2C001R0020000295:0:0:0:0 2:N:0:GCGATCTA_TCGCCTTA

Here is the log:
05/19/2023 02:09:55 PM - kneaddata.knead_data - INFO: Running kneaddata v0.12.0
05/19/2023 02:09:55 PM - kneaddata.knead_data - INFO: Output files will be written to: /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output
05/19/2023 02:09:55 PM - kneaddata.knead_data - DEBUG: Running with the following arguments:
verbose = False
input1 = S1_R1.fq.gz
input2 = S1_R2.fq.gz
unpaired = None
output_dir = /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output
scratch_dir =
reference_db = /home/yask/reference_data/knead_database/mouse/mouse_C57BL_6NJ
bypass_trim = False
output_prefix = S1_R1_kneaddata
threads = 10
processes = 1
trimmomatic_quality_scores = -phred33
bmtagger = False
bypass_trf = False
run_trf = False
fastqc_start = False
fastqc_end = False
store_temp_output = False
remove_intermediate_output = False
cat_final_output = False
log_level = DEBUG
log = /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.log
trimmomatic_path = /home/yask/software/Trimmomatic-0.33/dist/jar/trimmomatic-0.33.jar
run_trim_repetitive = False
max_memory = 500m
trimmomatic_options = None
sequencer_source = NexteraPE
bowtie2_path = /home/yask/miniconda3/bin/bowtie2
bowtie2_options = --very-sensitive-local --phred33
decontaminate_pairs = strict
reorder = False
serial = False
bmtagger_path = None
trf_path = /usr/local/bin/trf
match = 2
mismatch = 7
delta = 7
pm = 80
pi = 10
minscore = 50
maxperiod = 500
fastqc_path = None
remove_temp_output = True
input = /home/yask/raw_data/C3_ko_microbiome/all_files/S1_R1.fq.gz /home/yask/raw_data/C3_ko_microbiome/all_files/S1_R2.fq.gz
discordant = True

05/19/2023 02:09:55 PM - kneaddata.utilities - INFO: Decompressing gzipped file …
05/19/2023 02:10:20 PM - kneaddata.utilities - INFO: Decompressed file created: /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/decompressed_s3jnk457_S1_R1.fq
05/19/2023 02:10:20 PM - kneaddata.utilities - INFO: Decompressing gzipped file …
05/19/2023 02:10:46 PM - kneaddata.utilities - INFO: Decompressed file created: /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/decompressed_ep71hfv9_S1_R2.fq
05/19/2023 02:10:46 PM - kneaddata.utilities - INFO: Reformatting file sequence identifiers …
05/19/2023 02:11:00 PM - kneaddata.utilities - INFO: Reformatting file sequence identifiers …
05/19/2023 02:11:15 PM - kneaddata.utilities - INFO: Reordering read identifiers …
05/19/2023 02:12:40 PM - kneaddata.utilities - INFO: READ COUNT: raw pair1 : Initial number of reads ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/reordered_xo1nh998_reformatted_identifierscg5pbjdp_decompressed_s3jnk457_S1_R1 ): 4723481.0
05/19/2023 02:12:42 PM - kneaddata.utilities - INFO: READ COUNT: raw pair2 : Initial number of reads ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/reordered_0uk36g4f_reformatted_identifiersckys0o4u_decompressed_ep71hfv9_S1_R2 ): 4723481.0
05/19/2023 02:12:42 PM - kneaddata.utilities - DEBUG: Checking input file to Trimmomatic : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/reordered_xo1nh998_reformatted_identifierscg5pbjdp_decompressed_s3jnk457_S1_R1
05/19/2023 02:12:42 PM - kneaddata.utilities - DEBUG: Checking input file to Trimmomatic : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/reordered_0uk36g4f_reformatted_identifiersckys0o4u_decompressed_ep71hfv9_S1_R2
05/19/2023 02:12:42 PM - kneaddata.utilities - INFO: Running Trimmomatic …
05/19/2023 02:12:42 PM - kneaddata.utilities - INFO: Execute command: java -Xmx500m -jar /home/yask/software/Trimmomatic-0.33/dist/jar/trimmomatic-0.33.jar PE -threads 10 -phred33 /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/reordered_xo1nh998_reformatted_identifierscg5pbjdp_decompressed_s3jnk457_S1_R1 /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/reordered_0uk36g4f_reformatted_identifiersckys0o4u_decompressed_ep71hfv9_S1_R2 /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.1.fastq /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.1.fastq /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.2.fastq /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.2.fastq MINLEN:60 ILLUMINACLIP:/home/yask/miniconda3/lib/python3.10/site-packages/kneaddata/adapters/NexteraPE-PE.fa:2:30:10:8:TRUE SLIDINGWINDOW:4:20 MINLEN:75
05/19/2023 02:13:15 PM - kneaddata.utilities - DEBUG: b"TrimmomaticPE: Started with arguments: -threads 10 -phred33 /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/reordered_xo1nh998_reformatted_identifierscg5pbjdp_decompressed_s3jnk457_S1_R1 /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/reordered_0uk36g4f_reformatted_identifiersckys0o4u_decompressed_ep71hfv9_S1_R2 /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.1.fastq /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.1.fastq /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.2.fastq /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.2.fastq MINLEN:60 ILLUMINACLIP:/home/yask/miniconda3/lib/python3.10/site-packages/kneaddata/adapters/NexteraPE-PE.fa:2:30:10:8:TRUE SLIDINGWINDOW:4:20 MINLEN:75\nUsing PrefixPair: ‘AGATGTGTATAAGAGACAG’ and ‘AGATGTGTATAAGAGACAG’\nUsing Long Clipping Sequence: ‘GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG’\nUsing Long Clipping Sequence: ‘TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG’\nUsing Long Clipping Sequence: ‘CTGTCTCTTATACACATCTCCGAGCCCACGAGAC’\nUsing Long Clipping Sequence: ‘CTGTCTCTTATACACATCTGACGCTGCCGACGA’\nILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences\nInput Read Pairs: 4723481 Both Surviving: 3005635 (63.63%) Forward Only Surviving: 227344 (4.81%) Reverse Only Surviving: 1042352 (22.07%) Dropped: 448150 (9.49%)\nTrimmomaticPE: Completed successfully\n"
05/19/2023 02:13:15 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.1.fastq
05/19/2023 02:13:15 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.1.fastq
05/19/2023 02:13:15 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.2.fastq
05/19/2023 02:13:15 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.2.fastq
05/19/2023 02:13:17 PM - kneaddata.utilities - INFO: READ COUNT: trimmed pair1 : Total reads after trimming ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.1.fastq ): 3005635.0
05/19/2023 02:13:18 PM - kneaddata.utilities - INFO: READ COUNT: trimmed pair2 : Total reads after trimming ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.2.fastq ): 3005635.0
05/19/2023 02:13:19 PM - kneaddata.utilities - INFO: READ COUNT: trimmed orphan1 : Total reads after trimming ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.1.fastq ): 227344.0
05/19/2023 02:13:19 PM - kneaddata.utilities - INFO: READ COUNT: trimmed orphan2 : Total reads after trimming ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.2.fastq ): 1042352.0
05/19/2023 02:14:34 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.1.fasta
05/19/2023 02:14:34 PM - kneaddata.utilities - INFO: Running trf …
05/19/2023 02:14:34 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.1.fasta --output /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /usr/local/bin/trf --trf-options ‘2 7 7 80 10 50 500 -h -ngs’ --nproc 10
05/19/2023 02:16:30 PM - kneaddata.utilities - DEBUG: 0
05/19/2023 02:16:30 PM - kneaddata.utilities - DEBUG: Checking output file from trf : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat
05/19/2023 02:16:30 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.2.fasta
05/19/2023 02:16:30 PM - kneaddata.utilities - INFO: Running trf …
05/19/2023 02:16:30 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.2.fasta --output /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /usr/local/bin/trf --trf-options ‘2 7 7 80 10 50 500 -h -ngs’ --nproc 10
05/19/2023 02:18:26 PM - kneaddata.utilities - DEBUG: 0
05/19/2023 02:18:26 PM - kneaddata.utilities - DEBUG: Checking output file from trf : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat
05/19/2023 02:18:32 PM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.1.fastq ): 51035
05/19/2023 02:18:38 PM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.2.fastq ): 56936
05/19/2023 02:18:41 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.1.fasta
05/19/2023 02:18:41 PM - kneaddata.utilities - INFO: Running trf …
05/19/2023 02:18:41 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.1.fasta --output /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /usr/local/bin/trf --trf-options ‘2 7 7 80 10 50 500 -h -ngs’ --nproc 10
05/19/2023 02:18:51 PM - kneaddata.utilities - DEBUG: 0
05/19/2023 02:18:51 PM - kneaddata.utilities - DEBUG: Checking output file from trf : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat
05/19/2023 02:18:51 PM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.1.fastq ): 3568
05/19/2023 02:19:04 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.2.fasta
05/19/2023 02:19:04 PM - kneaddata.utilities - INFO: Running trf …
05/19/2023 02:19:04 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.2.fasta --output /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /usr/local/bin/trf --trf-options ‘2 7 7 80 10 50 500 -h -ngs’ --nproc 10
05/19/2023 02:19:44 PM - kneaddata.utilities - DEBUG: 0
05/19/2023 02:19:44 PM - kneaddata.utilities - DEBUG: Checking output file from trf : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat
05/19/2023 02:19:46 PM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.2.fastq ): 18739
05/19/2023 02:19:46 PM - kneaddata.run - INFO: Decontaminating …
05/19/2023 02:19:46 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.repeats.removed.1.fastq
05/19/2023 02:19:46 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.repeats.removed.2.fastq
05/19/2023 02:19:46 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.repeats.removed.unmatched.1.fastq
05/19/2023 02:19:46 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.repeats.removed.unmatched.2.fastq
05/19/2023 02:19:46 PM - kneaddata.utilities - INFO: Running bowtie2 …
05/19/2023 02:19:46 PM - kneaddata.utilities - INFO: Execute command: kneaddata_bowtie2_discordant_pairs --bowtie2 /home/yask/miniconda3/bin/bowtie2 --threads 10 -x /home/yask/reference_data/knead_database/mouse/mouse_C57BL_6NJ --mode strict --bowtie2-options “–very-sensitive-local --phred33” -1 /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.repeats.removed.1.fastq -2 /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.repeats.removed.2.fastq --un-pair /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_clean_%.fastq --al-pair /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_contam_%.fastq -U /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.repeats.removed.unmatched.1.fastq,/home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.repeats.removed.unmatched.2.fastq --un-single /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_%clean.fastq --al-single /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched%_contam.fastq -S /dev/null
05/19/2023 02:27:09 PM - kneaddata.utilities - DEBUG: b’7150688 reads; of these:\n 7150688 (100.00%) were unpaired; of these:\n 4763598 (66.62%) aligned 0 times\n 1494065 (20.89%) aligned exactly 1 time\n 893025 (12.49%) aligned >1 times\n33.38% overall alignment rate\npair1_aligned : 0\npair2_aligned : 0\npair1_unaligned : 0\npair2_unaligned : 0\norphan1_aligned : 1044961\norphan2_aligned : 1342129\norphan1_unaligned : 2133415\norphan2_unaligned : 2630183\n’
05/19/2023 02:27:09 PM - kneaddata.utilities - DEBUG: Checking output file from bowtie2 : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_clean_1.fastq
05/19/2023 02:27:09 PM - kneaddata.utilities - DEBUG: Checking output file from bowtie2 : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_clean_2.fastq
05/19/2023 02:27:10 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_contam_1.fastq ) : 0.0
05/19/2023 02:27:10 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_contam_2.fastq ) : 0.0
05/19/2023 02:27:11 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_1_contam.fastq ) : 1044961.0
05/19/2023 02:27:12 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_2_contam.fastq ) : 1342129.0
05/19/2023 02:27:12 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated mouse_C57BL_6NJ pair1 : Total reads after removing those found in reference database ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_clean_1.fastq ): 0.0
05/19/2023 02:27:12 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated mouse_C57BL_6NJ pair2 : Total reads after removing those found in reference database ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_clean_2.fastq ): 0.0
05/19/2023 02:27:12 PM - kneaddata.utilities - INFO: READ COUNT: final pair1 : Total reads after merging results from multiple databases ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_paired_1.fastq ): 0.0
05/19/2023 02:27:12 PM - kneaddata.utilities - INFO: READ COUNT: final pair2 : Total reads after merging results from multiple databases ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_paired_2.fastq ): 0.0
05/19/2023 02:27:12 PM - kneaddata.utilities - WARNING: Unable to remove file: /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_clean_1.fastq
05/19/2023 02:27:12 PM - kneaddata.utilities - WARNING: Unable to remove file: /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_clean_2.fastq
05/19/2023 02:27:15 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated mouse_C57BL_6NJ orphan1 : Total reads after removing those found in reference database ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_1_clean.fastq ): 2133415.0
05/19/2023 02:27:16 PM - kneaddata.utilities - INFO: READ COUNT: final orphan1 : Total reads after merging results from multiple databases ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_unmatched_1.fastq ): 2133415.0
05/19/2023 02:27:16 PM - kneaddata.utilities - WARNING: Unable to remove file: /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_1_clean.fastq
05/19/2023 02:27:18 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated mouse_C57BL_6NJ orphan2 : Total reads after removing those found in reference database ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_2_clean.fastq ): 2630183.0
05/19/2023 02:27:20 PM - kneaddata.utilities - INFO: READ COUNT: final orphan2 : Total reads after merging results from multiple databases ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_unmatched_2.fastq ): 2630183.0
05/19/2023 02:27:20 PM - kneaddata.utilities - WARNING: Unable to remove file: /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_2_clean.fastq
05/19/2023 02:27:20 PM - kneaddata.knead_data - INFO:
Final output files created:
/home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_paired_1.fastq
/home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_paired_2.fastq
/home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_unmatched_1.fastq
/home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_unmatched_2.fastq

It would be great if you could help me with that!
It can be that I’m just missing something primitive, but I can not find the problem.

Regards

Artem

1 Like

Hi Artem,

I’m running into the same issue you are regarding the unpaired reads. Did you ever find a solution?

Best,
Mark

Hi Artem,

I also have the exact same problem and am stumped as to a solution – did you end up finding one?

Thanks!
Fran

1 Like