Hi,
After running Kneaddata with Bowtie2 on paired-end data, the output I’m getting from the final output seems to be unpaired (the first read has over 9x the amount of reads as the second). I’m curious to know if there’s a way to force paired-end reads in the analysis and throw out any reads which are unpaired.
The command I ran was the following: 
kneaddata --input sample_1.fastq --input sample_2.fastq --output /path/to/mydir --bypass-trim --run-trf -db /kneaddataGenome/SILVA_128_LSUParc_SSUParc_ribosomal_RNA
Thanks!
             
            
               
               
               
            
            
           
          
            
            
              Hi, Thanks for the post. Kneaddata should by default track read pairs if a pair of input files are provided. You should see pair output files (with the same number of reads) and orphan files. I think in your case kneaddata is possibly having an issue tracking the pairs due to sequence identifiers of an unexpected format. Can you check to see if there are spaces in the sequence ids or possibly they are missing the read number?
On our end we will work on updating kneaddata to catch the case where sequence ids of an unexpected format are provided and throw an informative error message. Sorry for the confusion.
Thank you, 
Lauren
             
            
               
               
               
            
            
           
          
            
            
              Hi Lauren,
Thank you so much for you reply. I changed the sequence identifiers and sure enough, that solved the issue. Thanks again!
Best, 
Kat
             
            
               
               
               
            
            
           
          
            
            
              @lauren.j.mciver  I have a follow up question to this thread. Kneaddata v.0.7.4 is not identifying the paired ends when run in the biobakery_workflow v.3.0.0-alpha.7. My file names are “ABCD_S70_R1.fastq.gz”, so I added the command line options --pair-identifier “_R1” since it does not follow the “.R1” format in the user guide on github.
My sequence identifiers in the fastq look like this: @EAS139:136:FC706VJ:2:2104:15343:197393 1:N:18:ATCACG. Would the space between the first bit and the 1:N... cause the workflow to miss the pair?
The other oddity is that when I run kneaddata on the same data files outside of the workflow and without the --pair-identifier flag, it runs successfully and correctly merges R1/R2 into a single fastq.
Do you have any guidance? I’m not sure how to troubleshoot given that kneaddata independently runs correctly, but fails to merge the pairs when run as part of the workflow.
             
            
               
               
               
            
            
           
          
            
            
              Hi @ewissel , Thank you for the detailed post. I just checked the Kneaddata code for the latest version and it should catch your sequence identifier format with the space plus “:1”. The --pair-identifier flag is just used for the bioBakery workflows so the tool can pick up the paired files to pass on to Kneaddata. If you don’t use that flag and the identifier does not match the default then the workflow will process the reads as single end. Have you tried updating to the latest version of Kneaddata v0.10.0?
Thank you, 
Lauren
             
            
               
               
               
            
            
           
          
            
            
              Thanks, Lauren! Maybe the kneaddata version is the issue. I download the workflow via conda, and the current version of kneaddata on conda is 0.7.4. Is it possible to update the conda install to v0.10.0?
             
            
               
               
               
            
            
           
          
            
            
              Hi @ewissel  , I pushed the latest version of kneaddata to conda. Please try it out and let me know if it resolves the issue you are seeing.
Thank you, 
Lauren
             
            
               
               
               
            
            
           
          
            
            
              Thanks, Lauren!
When I run conda update -c biobakery kneaddata, I still get v0.7.4. I know that bioconda and biobakery channels have kneaddata, but I thought that directing to the biobakery channel would ensure conda looks there to update kneaddata.
Do you have any advice on this?
             
            
               
               
               
            
            
           
          
            
            
              Hi Emily, I think conda will pick the best version based on your current environment which is not always the latest version. Can you try adding the specific version into your command kneaddata=0.10.0 and see if that will get the latest one?
Thanks! 
Lauren
             
            
               
               
               
            
            
           
          
            
            
              Hey Lauren,
I was successfully able to update kneaddata with conda install -c biobakery kneaddata=0.10.0 (looks like conda upgrade doesn’t like version info). However, this did not resolve kneaddata not matching paired end files properly.
My files are name “sampleID_SX_R1_001.fastq.gz”, and for the workflow I have tried the following pair identifier arguments:
“_R1” 
“_R1_001” 
“_R1_” 
“R1_” 
“R1_001” 
 
Is my identifier argument wrong? Should I be doing something differently for anadama2/the workflow?
Also, I am using biobakery_workflows v3.0.0-alpha.7, which is the latest on conda. Should I be using a different version?
             
            
               
               
               
            
            
           
          
            
            
              Hi Emily, Thank you for the follow up. I am glad you were able to update to the latest kneaddata version. I think any of those pair identifiers should work with the workflow. If you could send me (feel free to send it directly to me) your log file I can dig in a bit more detail to see what might be going on.
Thank you, 
Lauren
             
            
               
               
               
            
            
           
          
            
              
                CK_zhu  
                
               
              
                  
                    July 17, 2021,  1:59am
                   
                   
              12 
               
             
            
              Hi lauren I will continue this problem
First I use kneaddata 0.7.2 it give me unpaired reads
singularity --debug exec
  /software/kneaddata_0.7.2.sif kneaddata \
  --remove-intermediate-output --threads 4 --bypass-trim \
  --input ../raw_data/SRR527911_1.fastq.gz --input ../raw_data/SRR527911_2.fastq.gz \
  --output ../temp/01_kneaddata --reference-db ../databases/kneaddata/human_genome \
  --bowtie2-options "--very-sensitive --dovetail"
 
paired 1 737M and paired 2 82M are not same
-rw-r--r-- 1 ckzhu sample_lib    0 Jul 17 09:07 SRR527911_1_kneaddata_unmatched_2.fastq
-rw-r--r-- 1 ckzhu sample_lib    0 Jul 17 09:07 SRR527911_1_kneaddata_Homo_sapiens_bowtie2_unmatched_1_contam.fastq
-rw-r--r-- 1 ckzhu sample_lib  13K Jul 17 09:07 SRR527911_1_kneaddata_unmatched_1.fastq
-rw-r--r-- 1 ckzhu sample_lib  82M Jul 17 09:07 SRR527911_1_kneaddata_paired_2.fastq
-rw-r--r-- 1 ckzhu sample_lib 737M Jul 17 09:07 SRR527911_1_kneaddata_paired_1.fastq
-rw-r--r-- 1 ckzhu sample_lib 1.4K Jul 17 09:07 SRR527911_1_kneaddata_Homo_sapiens_bowtie2_unmatched_2_contam.fastq
-rw-r--r-- 1 ckzhu sample_lib  448 Jul 17 09:07 SRR527911_1_kneaddata_Homo_sapiens_bowtie2_paired_contam_2.fastq
-rw-r--r-- 1 ckzhu sample_lib 4.0K Jul 17 09:07 SRR527911_1_kneaddata_Homo_sapiens_bowtie2_paired_contam_1.fastq
-rw-r--r-- 1 ckzhu sample_lib  11K Jul 17 09:07 SRR527911_1_kneaddata.log
 
Then I use kneaddata 0.10.0 from docker images 
I get new error
singularity --debug exec
  /software/kneaddata_0.10.0.sif kneaddata \
  --remove-intermediate-output --threads 4 --bypass-trim \
  --input ../raw_data/SRR527911_1.fastq.gz --input ../raw_data/SRR527911_2.fastq.gz \
  --output ../temp/01_kneaddata --reference-db ../databases/kneaddata/human_genome \
  --bowtie2-options "--very-sensitive --dovetail"
 
Decompressing gzipped file ...
Decompressing gzipped file ...
Reformatting file sequence identifiers ...
Reformatting file sequence identifiers ...
Initial number of reads ( /public/home/sample_lib/ckzhu/software/Snakemake_singularity/test/pipeline/temp/01_kneaddata/reformatted_identifiers8t6864hl_decompressed_zex5i232_SRR527911_1 ): 1921490.0
Initial number of reads ( /public/home/sample_lib/ckzhu/software/Snakemake_singularity/test/pipeline/temp/01_kneaddata/reformatted_identifiersrphkmryn_decompressed_gf1ann04_SRR527911_2 ): 1921490.0
Bypass trimming
Total reads after trimming ( /public/home/sample_lib/ckzhu/software/Snakemake_singularity/test/pipeline/temp/01_kneaddata/reformatted_identifiers8t6864hl_decompressed_zex5i232_SRR527911_1 ): 1921490.0
Total reads after trimming ( /public/home/sample_lib/ckzhu/software/Snakemake_singularity/test/pipeline/temp/01_kneaddata/reformatted_identifiersrphkmryn_decompressed_gf1ann04_SRR527911_2 ): 1921490.0
ERROR: Unable to write file: /public/home/sample_lib/ckzhu/software/Snakemake_singularity/test/pipeline/temp/01_kneaddata/reformatted_identifiers8t6864hl_decompressed_zex5i232_SRR527911_1
DEBUG   [U=5169,P=28402]   Master()                      Child exited with exit status 1
 
why I can’t write tmp file in that dir? should root to run singularity? 
If I use conda ,I also can’t write
conda create -n kneaddata kneaddata=0.10.0
source activate kneaddata
kneaddata \
  --remove-intermediate-output --threads 4 --bypass-trim \ 
  --input ../raw_data/SRR527911_1.fastq.gz --input ../raw_data/SRR527911_2.fastq.gz \
  --output ../temp/01_kneaddata --reference-db ../databases/kneaddata/human_genome \
  --bowtie2-options "--very-sensitive --dovetail"
 
Decompressing gzipped file ...
Decompressing gzipped file ...
Reformatting file sequence identifiers ...
Reformatting file sequence identifiers ...
Initial number of reads ( /public/home/sample_lib/ckzhu/software/Snakemake_singularity/test/pipeline/temp/01_kneaddata/reformatted_identifiersjsysc7lb_decompressed_sff5u2hr_SRR527911_1 ): 1921490.0
Initial number of reads ( /public/home/sample_lib/ckzhu/software/Snakemake_singularity/test/pipeline/temp/01_kneaddata/reformatted_identifiersf3j5gvzj_decompressed_hab_axlq_SRR527911_2 ): 1921490.0
Bypass trimming
Total reads after trimming ( /public/home/sample_lib/ckzhu/software/Snakemake_singularity/test/pipeline/temp/01_kneaddata/reformatted_identifiersjsysc7lb_decompressed_sff5u2hr_SRR527911_1 ): 1921490.0
Total reads after trimming ( /public/home/sample_lib/ckzhu/software/Snakemake_singularity/test/pipeline/temp/01_kneaddata/reformatted_identifiersf3j5gvzj_decompressed_hab_axlq_SRR527911_2 ): 1921490.0
ERROR: Unable to write file: /public/home/sample_lib/ckzhu/software/Snakemake_singularity/test/pipeline/temp/01_kneaddata/reformatted_identifiersjsysc7lb_decompressed_sff5u2hr_SRR527911_1
 
             
            
               
               
               
            
            
           
          
            
            
              I answered this elsewhere on the forum , but tldr: The sequence names apparently must have something like .R1. or .R2. as the input file name.
             
            
               
               
               
            
            
           
          
            
            
              Hello, Thank you for the detailed post and sorry for the slow response. I don’t think you need to run as root with singularity. It looks like it writes a couple of files before it fails. Is it possible you are running out of disk space? Kneaddata can use a bit of disk space (up to 4x the original input size) if if needs to decompress input files and reformat the sequence identifiers. I think in the first case with the older kneaddata version it is likely having an issue tracking the pairs which should be fixed in the newer version.
Thank you, 
Lauren
             
            
               
               
               
            
            
           
          
            
            
              Hi Lauren, 
I have trouble in generating paired end output. 
The command I ran was: 
kneaddata --input1 10co_S53.R1.fastq.gz --input2 10co_S53.R2.fastq.gz --reference-db rat --output kneaddata_out --trimmomatic-options=“SLIDINGWINDOW:4:20 MINLEN:95”
And the outputs are 
I am not sure if this is a sequence identifiers issue or something else.
The kneadata version is kneaddata v0.12.0
I can also send you my log file if that helps with answering my question.
Thank you!!!
             
            
               
               
               
            
            
           
          
            
            
              This is the log file. 
02/22/2023 11:23:51 PM - kneaddata.knead_data - INFO: Running kneaddata v0.12.0 
02/22/2023 11:23:51 PM - kneaddata.knead_data - INFO: Output files will be written to: /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out 
02/22/2023 11:23:51 PM - kneaddata.knead_data - DEBUG: Running with the following arguments: 
input2 = 10co_S53.R2.fastq.gz 
input1 = 10co_S53.R1.fastq.gz 
verbose = False 
bypass_trf = False 
bmtagger_path = None 
minscore = 50 
bowtie2_path = /panfs/roc/msisoft/bowtie2/2.4.4.gnu7.2.0/bin/bowtie2 
maxperiod = 500 
discordant = True 
serial = True 
fastqc_start = False 
store_temp_output = False 
cat_final_output = False 
log_level = DEBUG 
log = /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.log 
sequencer_source = NexteraPE 
max_memory = 500m 
remove_intermediate_output = False 
fastqc_path = None 
output_dir = /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out 
trf_path = /panfs/roc/msisoft/trf/407b_64/trf 
remove_temp_output = True 
reference_db = /panfs/jay/groups/29/gallaher/jiang329/practice/rat 
input = /panfs/jay/groups/29/gallaher/jiang329/practice/10co_S53.R1.fastq.gz /panfs/jay/groups/29/gallaher/jiang329/practice/10co_S53.R2.fastq.gz 
decontaminate_pairs = strict 
reorder = False 
pm = 80 
trimmomatic_path = /panfs/roc/msisoft/trimmomatic/0.33/trimmomatic.jar 
run_trf = False 
mismatch = 7 
threads = 1 
delta = 7 
bowtie2_options = --very-sensitive-local --phred33 
bypass_trim = False 
processes = 1 
pi = 10 
trimmomatic_quality_scores = -phred33 
fastqc_end = False 
scratch_dir = 
trimmomatic_options = SLIDINGWINDOW:4:20 MINLEN:95 
output_prefix = 10co_S53.R1_kneaddata 
match = 2 
bmtagger = False 
run_trim_repetitive = False 
unpaired = None
02/22/2023 11:23:51 PM - kneaddata.utilities - INFO: Decompressing gzipped file … 
02/22/2023 11:24:05 PM - kneaddata.utilities - INFO: Decompressed file created: /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/decompressed_yq9HK7_10co_S53.R1.fastq 
02/22/2023 11:24:05 PM - kneaddata.utilities - INFO: Decompressing gzipped file … 
02/22/2023 11:24:18 PM - kneaddata.utilities - INFO: Decompressed file created: /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/decompressed_Y9Uj6X_10co_S53.R2.fastq 
02/22/2023 11:24:18 PM - kneaddata.utilities - INFO: Reformatting file sequence identifiers … 
02/22/2023 11:24:33 PM - kneaddata.utilities - INFO: Reformatting file sequence identifiers … 
02/22/2023 11:24:51 PM - kneaddata.utilities - INFO: READ COUNT: raw pair1 : Initial number of reads ( /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/reformatted_identifiersqzSwGC_decompressed_yq9HK7_10co_S53.R1 ): 7072656 
02/22/2023 11:24:54 PM - kneaddata.utilities - INFO: READ COUNT: raw pair2 : Initial number of reads ( /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/reformatted_identifierswXuDF5_decompressed_Y9Uj6X_10co_S53.R2 ): 7072656 
02/22/2023 11:24:54 PM - kneaddata.utilities - DEBUG: Checking input file to Trimmomatic : /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/reformatted_identifiersqzSwGC_decompressed_yq9HK7_10co_S53.R1 
02/22/2023 11:24:54 PM - kneaddata.utilities - DEBUG: Checking input file to Trimmomatic : /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/reformatted_identifierswXuDF5_decompressed_Y9Uj6X_10co_S53.R2 
02/22/2023 11:24:54 PM - kneaddata.utilities - INFO: Running Trimmomatic … 
02/22/2023 11:24:54 PM - kneaddata.utilities - INFO: Execute command: java -Xmx500m -jar /panfs/roc/msisoft/trimmomatic/0.33/trimmomatic.jar PE -threads 1 -phred33 /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/reformatted_identifiersqzSwGC_decompressed_yq9HK7_10co_S53.R1 /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/reformatted_identifierswXuDF5_decompressed_Y9Uj6X_10co_S53.R2 /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.1.fastq /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.single.1.fastq /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.2.fastq /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.single.2.fastq SLIDINGWINDOW:4:20 MINLEN:95 
02/22/2023 11:26:09 PM - kneaddata.utilities - DEBUG: TrimmomaticPE: Started with arguments: -threads 1 -phred33 /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/reformatted_identifiersqzSwGC_decompressed_yq9HK7_10co_S53.R1 /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/reformatted_identifierswXuDF5_decompressed_Y9Uj6X_10co_S53.R2 /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.1.fastq /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.single.1.fastq /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.2.fastq /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.single.2.fastq SLIDINGWINDOW:4:20 MINLEN:95 
Input Read Pairs: 7072656 Both Surviving: 6269087 (88.64%) Forward Only Surviving: 359281 (5.08%) Reverse Only Surviving: 268073 (3.79%) Dropped: 176215 (2.49%) 
TrimmomaticPE: Completed successfully
02/22/2023 11:26:09 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.1.fastq 
02/22/2023 11:26:09 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.single.1.fastq 
02/22/2023 11:26:09 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.2.fastq 
02/22/2023 11:26:09 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.single.2.fastq 
02/22/2023 11:26:12 PM - kneaddata.utilities - INFO: READ COUNT: trimmed pair1 : Total reads after trimming ( /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.1.fastq ): 6269087 
02/22/2023 11:26:14 PM - kneaddata.utilities - INFO: READ COUNT: trimmed pair2 : Total reads after trimming ( /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.2.fastq ): 6269087 
02/22/2023 11:26:15 PM - kneaddata.utilities - INFO: READ COUNT: trimmed orphan1 : Total reads after trimming ( /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.single.1.fastq ): 359281 
02/22/2023 11:26:15 PM - kneaddata.utilities - INFO: READ COUNT: trimmed orphan2 : Total reads after trimming ( /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.single.2.fastq ): 268073 
02/22/2023 11:28:35 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.1.fasta 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Running trf … 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.1.fasta --output /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /panfs/roc/msisoft/trf/407b_64/trf --trf-options ‘2 7 7 80 10 50 500 -h -ngs’ --nproc 1 
02/22/2023 11:28:35 PM - kneaddata.utilities - CRITICAL: Error executing: kneaddata_trf_parallel --input /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.1.fasta --output /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /panfs/roc/msisoft/trf/407b_64/trf --trf-options ‘2 7 7 80 10 50 500 -h -ngs’ --nproc 1
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Total memory = 503.452335358 GB 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Available memory = 369.412334442 GB 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Free memory = 53.7003898621 GB 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Percent memory used = 26.6 % 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: CPU percent = 38.5 % 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Total cores count = 128 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Total disk = 1.990234375 GB 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Used disk = 0.234977722168 GB 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Percent disk used = 11.8 % 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process create time = 2023-02-22 23:28:34 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process user time = 0.01 seconds 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process system time = 0.0 seconds 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process CPU percent = 0.0 % 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process memory RSS = 0.010383605957 GB 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process memory VMS = 0.113941192627 GB 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process memory percent = 0.0020700575204 % 
02/22/2023 11:28:35 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.2.fasta 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Running trf … 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.2.fasta --output /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /panfs/roc/msisoft/trf/407b_64/trf --trf-options ‘2 7 7 80 10 50 500 -h -ngs’ --nproc 1 
02/22/2023 11:28:35 PM - kneaddata.utilities - CRITICAL: Error executing: kneaddata_trf_parallel --input /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.2.fasta --output /panfs/jay/groups/29/gallaher/jiang329/practice/kneaddata_out/10co_S53.R1_kneaddata.trimmed.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /panfs/roc/msisoft/trf/407b_64/trf --trf-options ‘2 7 7 80 10 50 500 -h -ngs’ --nproc 1
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Total memory = 503.452335358 GB 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Available memory = 369.412334442 GB 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Free memory = 53.7003898621 GB 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Percent memory used = 26.6 % 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: CPU percent = 37.6 % 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Total cores count = 128 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Total disk = 1.990234375 GB 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Used disk = 0.234977722168 GB 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Percent disk used = 11.8 % 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process create time = 2023-02-22 23:28:34 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process user time = 0.01 seconds 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process system time = 0.0 seconds 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process CPU percent = 0.0 % 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process memory RSS = 0.0104522705078 GB 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process memory VMS = 0.113945007324 GB 
02/22/2023 11:28:35 PM - kneaddata.utilities - INFO: Process memory percent = 0.00207611918224 %
             
            
               
               
               
            
            
           
          
            
            
              Thanks for the detailed error post. Kneadata only expects periods for file extensions so it is getting a bit confused in naming the output file with the “.R1” and “.R2”. If you would replace the “.” with a “_” in the file names that should hopefully fix the issue with the output file name.
Thank you, 
Lauren
             
            
               
               
               
            
            
           
          
            
            
              Hello, 
I’m sorry to revive this topic again. I have also stumbled upon a problem, that all my paired reads land in unpaired folder. 
My samples are murine samples, which were sequenced on DNBseq platform, so I thought it could be the problem of sequence headers, but until now I could not solve the problem by different modifications of sequence headers. However, when I run the samples separately first using trimmomatic and then bowtie2 with the same reference database, it seems like it is working and bowtie2 doesn’t complain on the headers. 
Here is the command I’m running:
kneaddata --input1 S1_R1.fq.gz --input2 S1_R2.fq.gz --threads 10 --trimmomatic ~/software/Trimmomatic-0.33/dist/jar/ -db /home/yask/reference_data/knead_database/mouse --output knead_output
 
here are the examples of sequence headers: 
R1: @V350094545L2C001R0020000295 :0:0:0:0 1:N:0:GCGATCTA_TCGCCTTA 
R2: @V350094545L2C001R0020000295 :0:0:0:0 2:N:0:GCGATCTA_TCGCCTTA
Here is the log: 
05/19/2023 02:09:55 PM - kneaddata.knead_data - INFO: Running kneaddata v0.12.0 
05/19/2023 02:09:55 PM - kneaddata.knead_data - INFO: Output files will be written to: /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output 
05/19/2023 02:09:55 PM - kneaddata.knead_data - DEBUG: Running with the following arguments: 
verbose = False 
input1 = S1_R1.fq.gz 
input2 = S1_R2.fq.gz 
unpaired = None 
output_dir = /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output 
scratch_dir = 
reference_db = /home/yask/reference_data/knead_database/mouse/mouse_C57BL_6NJ 
bypass_trim = False 
output_prefix = S1_R1_kneaddata 
threads = 10 
processes = 1 
trimmomatic_quality_scores = -phred33 
bmtagger = False 
bypass_trf = False 
run_trf = False 
fastqc_start = False 
fastqc_end = False 
store_temp_output = False 
remove_intermediate_output = False 
cat_final_output = False 
log_level = DEBUG 
log = /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.log 
trimmomatic_path = /home/yask/software/Trimmomatic-0.33/dist/jar/trimmomatic-0.33.jar 
run_trim_repetitive = False 
max_memory = 500m 
trimmomatic_options = None 
sequencer_source = NexteraPE 
bowtie2_path = /home/yask/miniconda3/bin/bowtie2 
bowtie2_options = --very-sensitive-local --phred33 
decontaminate_pairs = strict 
reorder = False 
serial = False 
bmtagger_path = None 
trf_path = /usr/local/bin/trf 
match = 2 
mismatch = 7 
delta = 7 
pm = 80 
pi = 10 
minscore = 50 
maxperiod = 500 
fastqc_path = None 
remove_temp_output = True 
input = /home/yask/raw_data/C3_ko_microbiome/all_files/S1_R1.fq.gz /home/yask/raw_data/C3_ko_microbiome/all_files/S1_R2.fq.gz 
discordant = True
05/19/2023 02:09:55 PM - kneaddata.utilities - INFO: Decompressing gzipped file … 
05/19/2023 02:10:20 PM - kneaddata.utilities - INFO: Decompressed file created: /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/decompressed_s3jnk457_S1_R1.fq 
05/19/2023 02:10:20 PM - kneaddata.utilities - INFO: Decompressing gzipped file … 
05/19/2023 02:10:46 PM - kneaddata.utilities - INFO: Decompressed file created: /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/decompressed_ep71hfv9_S1_R2.fq 
05/19/2023 02:10:46 PM - kneaddata.utilities - INFO: Reformatting file sequence identifiers … 
05/19/2023 02:11:00 PM - kneaddata.utilities - INFO: Reformatting file sequence identifiers … 
05/19/2023 02:11:15 PM - kneaddata.utilities - INFO: Reordering read identifiers … 
05/19/2023 02:12:40 PM - kneaddata.utilities - INFO: READ COUNT: raw pair1 : Initial number of reads ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/reordered_xo1nh998_reformatted_identifierscg5pbjdp_decompressed_s3jnk457_S1_R1 ): 4723481.0 
05/19/2023 02:12:42 PM - kneaddata.utilities - INFO: READ COUNT: raw pair2 : Initial number of reads ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/reordered_0uk36g4f_reformatted_identifiersckys0o4u_decompressed_ep71hfv9_S1_R2 ): 4723481.0 
05/19/2023 02:12:42 PM - kneaddata.utilities - DEBUG: Checking input file to Trimmomatic : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/reordered_xo1nh998_reformatted_identifierscg5pbjdp_decompressed_s3jnk457_S1_R1 
05/19/2023 02:12:42 PM - kneaddata.utilities - DEBUG: Checking input file to Trimmomatic : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/reordered_0uk36g4f_reformatted_identifiersckys0o4u_decompressed_ep71hfv9_S1_R2 
05/19/2023 02:12:42 PM - kneaddata.utilities - INFO: Running Trimmomatic … 
05/19/2023 02:12:42 PM - kneaddata.utilities - INFO: Execute command: java -Xmx500m -jar /home/yask/software/Trimmomatic-0.33/dist/jar/trimmomatic-0.33.jar PE -threads 10 -phred33 /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/reordered_xo1nh998_reformatted_identifierscg5pbjdp_decompressed_s3jnk457_S1_R1 /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/reordered_0uk36g4f_reformatted_identifiersckys0o4u_decompressed_ep71hfv9_S1_R2 /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.1.fastq /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.1.fastq /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.2.fastq /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.2.fastq MINLEN:60 ILLUMINACLIP:/home/yask/miniconda3/lib/python3.10/site-packages/kneaddata/adapters/NexteraPE-PE.fa:2:30:10:8:TRUE SLIDINGWINDOW:4:20 MINLEN:75 
05/19/2023 02:13:15 PM - kneaddata.utilities - DEBUG: b"TrimmomaticPE: Started with arguments: -threads 10 -phred33 /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/reordered_xo1nh998_reformatted_identifierscg5pbjdp_decompressed_s3jnk457_S1_R1 /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/reordered_0uk36g4f_reformatted_identifiersckys0o4u_decompressed_ep71hfv9_S1_R2 /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.1.fastq /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.1.fastq /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.2.fastq /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.2.fastq MINLEN:60 ILLUMINACLIP:/home/yask/miniconda3/lib/python3.10/site-packages/kneaddata/adapters/NexteraPE-PE.fa:2:30:10:8:TRUE SLIDINGWINDOW:4:20 MINLEN:75\nUsing PrefixPair: ‘AGATGTGTATAAGAGACAG’ and ‘AGATGTGTATAAGAGACAG’\nUsing Long Clipping Sequence: ‘GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG’\nUsing Long Clipping Sequence: ‘TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG’\nUsing Long Clipping Sequence: ‘CTGTCTCTTATACACATCTCCGAGCCCACGAGAC’\nUsing Long Clipping Sequence: ‘CTGTCTCTTATACACATCTGACGCTGCCGACGA’\nILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences\nInput Read Pairs: 4723481 Both Surviving: 3005635 (63.63%) Forward Only Surviving: 227344 (4.81%) Reverse Only Surviving: 1042352 (22.07%) Dropped: 448150 (9.49%)\nTrimmomaticPE: Completed successfully\n" 
05/19/2023 02:13:15 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.1.fastq 
05/19/2023 02:13:15 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.1.fastq 
05/19/2023 02:13:15 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.2.fastq 
05/19/2023 02:13:15 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.2.fastq 
05/19/2023 02:13:17 PM - kneaddata.utilities - INFO: READ COUNT: trimmed pair1 : Total reads after trimming ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.1.fastq ): 3005635.0 
05/19/2023 02:13:18 PM - kneaddata.utilities - INFO: READ COUNT: trimmed pair2 : Total reads after trimming ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.2.fastq ): 3005635.0 
05/19/2023 02:13:19 PM - kneaddata.utilities - INFO: READ COUNT: trimmed orphan1 : Total reads after trimming ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.1.fastq ): 227344.0 
05/19/2023 02:13:19 PM - kneaddata.utilities - INFO: READ COUNT: trimmed orphan2 : Total reads after trimming ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.2.fastq ): 1042352.0 
05/19/2023 02:14:34 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.1.fasta 
05/19/2023 02:14:34 PM - kneaddata.utilities - INFO: Running trf … 
05/19/2023 02:14:34 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.1.fasta --output /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /usr/local/bin/trf --trf-options ‘2 7 7 80 10 50 500 -h -ngs’ --nproc 10 
05/19/2023 02:16:30 PM - kneaddata.utilities - DEBUG: 0 
05/19/2023 02:16:30 PM - kneaddata.utilities - DEBUG: Checking output file from trf : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat 
05/19/2023 02:16:30 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.2.fasta 
05/19/2023 02:16:30 PM - kneaddata.utilities - INFO: Running trf … 
05/19/2023 02:16:30 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.2.fasta --output /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /usr/local/bin/trf --trf-options ‘2 7 7 80 10 50 500 -h -ngs’ --nproc 10 
05/19/2023 02:18:26 PM - kneaddata.utilities - DEBUG: 0 
05/19/2023 02:18:26 PM - kneaddata.utilities - DEBUG: Checking output file from trf : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat 
05/19/2023 02:18:32 PM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.1.fastq ): 51035 
05/19/2023 02:18:38 PM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.2.fastq ): 56936 
05/19/2023 02:18:41 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.1.fasta 
05/19/2023 02:18:41 PM - kneaddata.utilities - INFO: Running trf … 
05/19/2023 02:18:41 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.1.fasta --output /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /usr/local/bin/trf --trf-options ‘2 7 7 80 10 50 500 -h -ngs’ --nproc 10 
05/19/2023 02:18:51 PM - kneaddata.utilities - DEBUG: 0 
05/19/2023 02:18:51 PM - kneaddata.utilities - DEBUG: Checking output file from trf : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat 
05/19/2023 02:18:51 PM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.1.fastq ): 3568 
05/19/2023 02:19:04 PM - kneaddata.utilities - DEBUG: Checking input file to trf : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.2.fasta 
05/19/2023 02:19:04 PM - kneaddata.utilities - INFO: Running trf … 
05/19/2023 02:19:04 PM - kneaddata.utilities - INFO: Execute command: kneaddata_trf_parallel --input /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.2.fasta --output /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /usr/local/bin/trf --trf-options ‘2 7 7 80 10 50 500 -h -ngs’ --nproc 10 
05/19/2023 02:19:44 PM - kneaddata.utilities - DEBUG: 0 
05/19/2023 02:19:44 PM - kneaddata.utilities - DEBUG: Checking output file from trf : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.2.fasta.trf.parameters.2.7.7.80.10.50.500.dat 
05/19/2023 02:19:46 PM - kneaddata.run - INFO: Total number of sequences with repeats removed from file ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.trimmed.single.2.fastq ): 18739 
05/19/2023 02:19:46 PM - kneaddata.run - INFO: Decontaminating … 
05/19/2023 02:19:46 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.repeats.removed.1.fastq 
05/19/2023 02:19:46 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.repeats.removed.2.fastq 
05/19/2023 02:19:46 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.repeats.removed.unmatched.1.fastq 
05/19/2023 02:19:46 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.repeats.removed.unmatched.2.fastq 
05/19/2023 02:19:46 PM - kneaddata.utilities - INFO: Running bowtie2 … 
05/19/2023 02:19:46 PM - kneaddata.utilities - INFO: Execute command: kneaddata_bowtie2_discordant_pairs --bowtie2 /home/yask/miniconda3/bin/bowtie2 --threads 10 -x /home/yask/reference_data/knead_database/mouse/mouse_C57BL_6NJ --mode strict --bowtie2-options “–very-sensitive-local --phred33” -1 /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.repeats.removed.1.fastq -2 /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.repeats.removed.2.fastq --un-pair /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_clean_%.fastq --al-pair /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_contam_%.fastq -U /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.repeats.removed.unmatched.1.fastq,/home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata.repeats.removed.unmatched.2.fastq --un-single /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_%clean.fastq --al-single /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched %_contam.fastq -S /dev/null 
05/19/2023 02:27:09 PM - kneaddata.utilities - DEBUG: b’7150688 reads; of these:\n  7150688 (100.00%) were unpaired; of these:\n    4763598 (66.62%) aligned 0 times\n    1494065 (20.89%) aligned exactly 1 time\n    893025 (12.49%) aligned >1 times\n33.38% overall alignment rate\npair1_aligned : 0\npair2_aligned : 0\npair1_unaligned : 0\npair2_unaligned : 0\norphan1_aligned : 1044961\norphan2_aligned : 1342129\norphan1_unaligned : 2133415\norphan2_unaligned : 2630183\n’ 
05/19/2023 02:27:09 PM - kneaddata.utilities - DEBUG: Checking output file from bowtie2 : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_clean_1.fastq 
05/19/2023 02:27:09 PM - kneaddata.utilities - DEBUG: Checking output file from bowtie2 : /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_clean_2.fastq 
05/19/2023 02:27:10 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_contam_1.fastq ) : 0.0 
05/19/2023 02:27:10 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_contam_2.fastq ) : 0.0 
05/19/2023 02:27:11 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_1_contam.fastq ) : 1044961.0 
05/19/2023 02:27:12 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_2_contam.fastq ) : 1342129.0 
05/19/2023 02:27:12 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated mouse_C57BL_6NJ pair1 : Total reads after removing those found in reference database ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_clean_1.fastq ): 0.0 
05/19/2023 02:27:12 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated mouse_C57BL_6NJ pair2 : Total reads after removing those found in reference database ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_clean_2.fastq ): 0.0 
05/19/2023 02:27:12 PM - kneaddata.utilities - INFO: READ COUNT: final pair1 : Total reads after merging results from multiple databases ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_paired_1.fastq ): 0.0 
05/19/2023 02:27:12 PM - kneaddata.utilities - INFO: READ COUNT: final pair2 : Total reads after merging results from multiple databases ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_paired_2.fastq ): 0.0 
05/19/2023 02:27:12 PM - kneaddata.utilities - WARNING: Unable to remove file: /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_clean_1.fastq 
05/19/2023 02:27:12 PM - kneaddata.utilities - WARNING: Unable to remove file: /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_clean_2.fastq 
05/19/2023 02:27:15 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated mouse_C57BL_6NJ orphan1 : Total reads after removing those found in reference database ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_1_clean.fastq ): 2133415.0 
05/19/2023 02:27:16 PM - kneaddata.utilities - INFO: READ COUNT: final orphan1 : Total reads after merging results from multiple databases ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_unmatched_1.fastq ): 2133415.0 
05/19/2023 02:27:16 PM - kneaddata.utilities - WARNING: Unable to remove file: /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_1_clean.fastq 
05/19/2023 02:27:18 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated mouse_C57BL_6NJ orphan2 : Total reads after removing those found in reference database ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_2_clean.fastq ): 2630183.0 
05/19/2023 02:27:20 PM - kneaddata.utilities - INFO: READ COUNT: final orphan2 : Total reads after merging results from multiple databases ( /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_unmatched_2.fastq ): 2630183.0 
05/19/2023 02:27:20 PM - kneaddata.utilities - WARNING: Unable to remove file: /home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_2_clean.fastq 
05/19/2023 02:27:20 PM - kneaddata.knead_data - INFO: 
Final output files created: 
/home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_paired_1.fastq 
/home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_paired_2.fastq 
/home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_unmatched_1.fastq 
/home/yask/raw_data/C3_ko_microbiome/all_files/knead_output/S1_R1_kneaddata_unmatched_2.fastq
It would be great if you could help me with that! 
It can be that I’m just missing something primitive, but I can not find the problem.
Regards
Artem
             
            
               
               
              1 Like 
            
            
           
          
            
              
                gmark  
                
               
              
                  
                    July 3, 2023,  1:21am
                   
                   
              19 
               
             
            
              Hi Artem,
I’m running into the same issue you are regarding the unpaired reads. Did you ever find a solution?
Best, 
Mark
             
            
               
               
               
            
            
           
          
            
            
              Hi Artem,
I also have the exact same problem and am stumped as to a solution – did you end up finding one?
Thanks! 
Fran
             
            
               
               
              1 Like