Kneaddata - a non very explicit error message

Hi guys,

I have been working in the past with kneaddata and I did not see this error before. I also checked in the forum before posting. Any tip is appreciated:

I ran kneaddata in my server, as usual. The log file from the server report an error:

ERROR: Unable to write file: /rds/general/user/jm2018/ephemeral/biobakery/kneaddata/35231821S_R1.fastq/35231821S_R1_kneaddata.trimmed.1.fastq

However in the kneaddata log file I could not find a particular error about that file:

03/21/2022 10:51:44 AM - kneaddata.knead_data - INFO: Running kneaddata v0.10.0
03/21/2022 10:51:44 AM - kneaddata.knead_data - INFO: Output files will be written to: /rds/general/user/jm2018/ephemeral/biobakery/kneaddata/3523185S_R1.fastq
03/21/2022 10:51:44 AM - kneaddata.knead_data - DEBUG: Running with the following arguments:
verbose = False
input = /rds/general/user/jm2018/ephemeral/unzip_raw/3523185S_R1.fastq /rds/general/user/jm2018/ephemeral/unzip_raw/3523185S_R2.fastq
output_dir = /rds/general/user/jm2018/ephemeral/biobakery/kneaddata/3523185S_R1.fastq
reference_db = /rds/general/user/jm2018/ephemeral/biobakery_library/database_kneaddata/hg37dec_v0.1
bypass_trim = False
output_prefix = 3523185S_R1_kneaddata
threads = 6
processes = 1
trimmomatic_quality_scores = -phred33
bmtagger = False
bypass_trf = False
run_trf = False
fastqc_start = False
fastqc_end = False
store_temp_output = False
remove_intermediate_output = False
cat_final_output = False
log_level = DEBUG
log = /rds/general/user/jm2018/ephemeral/biobakery/kneaddata/3523185S_R1.fastq/3523185S_R1_kneaddata.log
trimmomatic_path = /rds/general/user/jm2018/home/Trimmomatic-0.39/trimmomatic-0.39.jar
run_trim_repetitive = False
max_memory = 500m
trimmomatic_options = None
sequencer_source = NexteraPE
bowtie2_path = /rds/general/user/jm2018/home/anaconda3/envs/biobakery/bin/bowtie2
bowtie2_options = --very-sensitive-local --phred33
decontaminate_pairs = strict
reorder = False
serial = False
bmtagger_path = None
trf_path = /rds/general/user/jm2018/home/anaconda3/envs/biobakery/bin/trf
match = 2
mismatch = 7
delta = 7
pm = 80
pi = 10
minscore = 50
maxperiod = 500
fastqc_path = None
remove_temp_output = True
discordant = True

03/21/2022 10:55:02 AM - kneaddata.utilities - INFO: READ COUNT: raw pair1 : Initial number of reads ( /rds/general/user/jm2018/ephemeral/unzip_raw/3523185S_R1.fastq ): 11761708.0
03/21/2022 10:58:52 AM - kneaddata.utilities - INFO: READ COUNT: raw pair2 : Initial number of reads ( /rds/general/user/jm2018/ephemeral/unzip_raw/3523185S_R2.fastq ): 11761708.0
03/21/2022 10:58:52 AM - kneaddata.utilities - DEBUG: Checking input file to Trimmomatic : /rds/general/user/jm2018/ephemeral/unzip_raw/3523185S_R1.fastq
03/21/2022 10:58:52 AM - kneaddata.utilities - DEBUG: Checking input file to Trimmomatic : /rds/general/user/jm2018/ephemeral/unzip_raw/3523185S_R2.fastq
03/21/2022 10:58:52 AM - kneaddata.utilities - INFO: Running Trimmomatic ā€¦
03/21/2022 10:58:52 AM - kneaddata.utilities - INFO: Execute command: java -Xmx500m -jar /rds/general/user/jm2018/home/Trimmomatic-0.39/trimmomatic-0.39.jar PE -threads 6 -phred33 /rds/general/user/jm2018/ephemeral/unzip_raw/3523185S_R1.fastq /rds/general/user/jm2018/ephemeral/unzip_raw/3523185S_R2.fastq /rds/general/user/jm2018/ephemeral/biobakery/kneaddata/3523185S_R1.fastq/3523185S_R1_kneaddata.trimmed.1.fastq /rds/general/user/jm2018/ephemeral/biobakery/kneaddata/3523185S_R1.fastq/3523185S_R1_kneaddata.trimmed.single.1.fastq /rds/general/user/jm2018/ephemeral/biobakery/kneaddata/3523185S_R1.fastq/3523185S_R1_kneaddata.trimmed.2.fastq /rds/general/user/jm2018/ephemeral/biobakery/kneaddata/3523185S_R1.fastq/3523185S_R1_kneaddata.trimmed.single.2.fastq MINLEN:60 ILLUMINACLIP:/rds/general/user/jm2018/home/anaconda3/envs/biobakery/lib/python3.7/site-packages/kneaddata/adapters/NexteraPE-PE.fa:2:30:10:8:TRUE SLIDINGWINDOW:4:20 MINLEN:75
03/21/2022 11:04:52 AM - kneaddata.utilities - DEBUG: b"TrimmomaticPE: Started with arguments:\n -threads 6 -phred33 /rds/general/user/jm2018/ephemeral/unzip_raw/3523185S_R1.fastq /rds/general/user/jm2018/ephemeral/unzip_raw/3523185S_R2.fastq /rds/general/user/jm2018/ephemeral/biobakery/kneaddata/3523185S_R1.fastq/3523185S_R1_kneaddata.trimmed.1.fastq /rds/general/user/jm2018/ephemeral/biobakery/kneaddata/3523185S_R1.fastq/3523185S_R1_kneaddata.trimmed.single.1.fastq /rds/general/user/jm2018/ephemeral/biobakery/kneaddata/3523185S_R1.fastq/3523185S_R1_kneaddata.trimmed.2.fastq /rds/general/user/jm2018/ephemeral/biobakery/kneaddata/3523185S_R1.fastq/3523185S_R1_kneaddata.trimmed.single.2.fastq MINLEN:60 ILLUMINACLIP:/rds/general/user/jm2018/home/anaconda3/envs/biobakery/lib/python3.7/site-packages/kneaddata/adapters/NexteraPE-PE.fa:2:30:10:8:TRUE SLIDINGWINDOW:4:20 MINLEN:75\nUsing PrefixPair: ā€˜AGATGTGTATAAGAGACAGā€™ and ā€˜AGATGTGTATAAGAGACAGā€™\nUsing Long Clipping Sequence: ā€˜GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGā€™\nUsing Long Clipping Sequence: ā€˜TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGā€™\nUsing Long Clipping Sequence: ā€˜CTGTCTCTTATACACATCTGACGCTGCCGACGAā€™\nUsing Long Clipping Sequence: ā€˜CTGTCTCTTATACACATCTCCGAGCCCACGAGACā€™\nILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences\nInput Read Pairs: 11761708 Both Surviving: 9667227 (82.19%) Forward Only Surviving: 1442760 (12.27%) Reverse Only Surviving: 345569 (2.94%) Dropped: 306152 (2.60%)\nTrimmomaticPE: Completed successfully\n"
03/21/2022 11:04:52 AM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /rds/general/user/jm2018/ephemeral/biobakery/kneaddata/3523185S_R1.fastq/3523185S_R1_kneaddata.trimmed.1.fastq
03/21/2022 11:04:52 AM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /rds/general/user/jm2018/ephemeral/biobakery/kneaddata/3523185S_R1.fastq/3523185S_R1_kneaddata.trimmed.single.1.fastq
03/21/2022 11:04:52 AM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /rds/general/user/jm2018/ephemeral/biobakery/kneaddata/3523185S_R1.fastq/3523185S_R1_kneaddata.trimmed.2.fastq
03/21/2022 11:04:52 AM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /rds/general/user/jm2018/ephemeral/biobakery/kneaddata/3523185S_R1.fastq/3523185S_R1_kneaddata.trimmed.single.2.fastq
03/21/2022 11:06:16 AM - kneaddata.utilities - INFO: READ COUNT: trimmed pair1 : Total reads after trimming ( /rds/general/user/jm2018/ephemeral/biobakery/kneaddata/3523185S_R1.fastq/3523185S_R1_kneaddata.trimmed.1.fastq ): 9667227.0
03/21/2022 11:06:58 AM - kneaddata.utilities - INFO: READ COUNT: trimmed pair2 : Total reads after trimming ( /rds/general/user/jm2018/ephemeral/biobakery/kneaddata/3523185S_R1.fastq/3523185S_R1_kneaddata.trimmed.2.fastq ): 9667227.0
03/21/2022 11:07:05 AM - kneaddata.utilities - INFO: READ COUNT: trimmed orphan1 : Total reads after trimming ( /rds/general/user/jm2018/ephemeral/biobakery/kneaddata/3523185S_R1.fastq/3523185S_R1_kneaddata.trimmed.single.1.fastq ): 1442760.0
03/21/2022 11:07:07 AM - kneaddata.utilities - INFO: READ COUNT: trimmed orphan2 : Total reads after trimming ( /rds/general/user/jm2018/ephemeral/biobakery/kneaddata/3523185S_R1.fastq/3523185S_R1_kneaddata.trimmed.single.2.fastq ): 345569.0

The number of reads between file 1 and 2 is matched, what I suppose is good. Problem is that kneaddata stops there, there is no further file or outcome.

Any idea where the problem may be?

Thanks very much,

Jesus

3523185S_R1_kneaddata.txt (6.6 KB)

Hello!I have the same error. Do you have a solution now?

1 Like

The answer is actually both very simple and terribly frustrating. tldr: The sequence names apparently must have .R1. or .R2. as the input file name. I havenā€™t fully tested variations on this, but I do know that files with _R1. or _R2. will error out with the above.

Developers: Is there any way to make that more flexible? The underscore version is a very common convention used, not just from our sequencing center but from quite a few others.

Hello, Kneaddata is more flexible with its latest versions (see versions 0.11.0 and newer).

In older versions of kneaddata it was unable to track pairs if the read identifiers (the first line in each set of 4 lines in the file) did not include the R1 and R2 information in an expected standard format. With the latest version the user can specify the files that are R1 and R2; No specific read identifier formatting is required. Kneaddata will add information, if needed, so it is able to track the paired reads through the workflow.

$ kneaddata --input1 seq_R1.fastq --input2 seq_R2.fastq -db $DATABASE --output kneaddata_output

Let us know if you notice anything else that is unclear or unexpected!

Thanks!
Lauren

1 Like

kneaddata v0.12.0

I ran into this issue just now with version 0.12.0 of kneaddata (running via apptainer image from biocontainers). With a test sample with files sample1_1.fq.gz and sample1_2.fq.gz I got the same error message. Renaming input to sample1.R1.fastq.gz and sample1.R2.fastq.gz gets rid of the error.