Long ugly file names of output files by Kneaddata

When running the Kneaddata, the output files have large and ugly names without extension.

M-21_XXXXXXXXX_AD011.log
reformatted_identifiers5h01wxl8_decompressed_wkysaz79_M-21_XXXXXXXXX_AD011_L001_R1
reformatted_identifiersc3g8pypf_decompressed_67a9qw7h_M-21_XXXXXXXXX_AD011_L001_R2

Also, I am unable to run Kneaddata using Nextflow.
Command being used is as follows:

process KneadData{
publishDir "$params.outdir", mode: 'copy'
maxForks params.jobs
        input:
        tuple val(sid), path(reads)
        output:
        path "*"
        script:
        """
mkdir ${reads[0].simpleName.replaceAll(/_L001_R1/, "")}
/home/subudhak/miniconda3/envs/metagenomics-tools/bin/kneaddata -i1 ${reads[0]} -i2 ${reads[1]} \
        --bypass-trim \
        -db ${params.db}  \
        --run-trf \
        --threads 5 \
        --sequencer-source none\
        --output-prefix ${reads[0].simpleName.replaceAll(/_L001_R1/, "")} \
        -o ${reads[0].simpleName.replaceAll(/_L001_R1/, "")}
        """
}

And the error is

N E X T F L O W  ~  version 22.04.5
Launching `kneddata.nf` [festering_hugle] DSL2 - revision: a7e525a8a0
executor >  local (2)
[16/caa230] process > KneadData (9) [  0%] 0 of 12
Error executing process > 'KneadData (1)'

Caused by:
  Process `KneadData (1)` terminated with an error exit status (1)

Command executed:

  mkdir M-21_XXXXXXXXXXXX_AD011
  /home/subudhak/miniconda3/envs/metagenomics-tools/bin/kneaddata -i1 M-21_XXXXXXXXXXXX_AD011_L001_R1.fastq.gz -i2 M-21_XXXXXXXXXXXX_AD011_L001_R2.fastq.gz 	--bypass-trim 	-db /home/subudhak/Documents/DRT_Data/resources/human_bwt  	--run-trf 	--threads 5 	--sequencer-source none	--output-prefix M-21_XXXXXXXXXXXX_AD011 	-o M-21_XXXXXXXXXXXX_AD011

Command exit status:
  1

Command output:
  Decompressing gzipped file ...
  
  Decompressing gzipped file ...
  
  Reformatting file sequence identifiers ...
  
  Reformatting file sequence identifiers ...
  
  Initial number of reads ( M-21_XXXXXXXXXXXX_AD011/reformatted_identifiers5h01wxl8_decompressed_wkysaz79_M-21_XXXXXXXXXXXX_AD011_L001_R1 ): 22252867.0
executor >  local (2)
[16/caa230] process > KneadData (9) [  9%] 1 of 11, failed: 1
Error executing process > 'KneadData (1)'

Caused by:
  Process `KneadData (1)` terminated with an error exit status (1)

Command executed:

  mkdir M-21_XXXXXXXXXXXX_AD011
  /home/subudhak/miniconda3/envs/metagenomics-tools/bin/kneaddata -i1 M-21_XXXXXXXXXXXX_AD011_L001_R1.fastq.gz -i2 M-21_XXXXXXXXXXXX_AD011_L001_R2.fastq.gz 	--bypass-trim 	-db /home/subudhak/Documents/DRT_Data/resources/human_bwt  	--run-trf ubudhak--threads 5 r_Tu--sequencer-source nonef--output-prefix M-21_XXXXXXXXXXXX_AD011 	-o M-21_XXXXXXXXXXXX_AD011

Command exit status:
  1

Command output:
  Decompressing gzipped file ...
  
  Decompressing gzipped file ...
  
  Reformatting file sequence identifiers ...
  
  Reformatting file sequence identifiers ...
  
  Initial number of reads ( M-21_XXXXXXXXXXXX_AD011/reformatted_identifiers5h01wxl8_decompressed_wkysaz79_M-21_XXXXXXXXXXXX_AD011_L001_R1 ): 22252867.0
  Initial number of reads ( M-21_XXXXXXXXXXXX_AD011/reformatted_identifiersc3g8pypf_decompressed_67a9qw7h_M-21_XXXXXXXXXXXX_AD011_L001_R2 ): 22252867.0
  Bypass trimming
  Total reads after trimming ( M-21_XXXXXXXXXXXX_AD011/reformatted_identifiers5h01wxl8_decompressed_wkysaz79_M-21_XXXXXXXXXXXX_AD011_L001_R1 ): 22252867.0
  Total reads after trimming ( M-21_XXXXXXXXXXXX_AD011/reformatted_identifiersc3g8pypf_decompressed_67a9qw7h_M-21_XXXXXXXXXXXX_AD011_L001_R2 ): 22252867.0

Command error:
  ERROR: Unable to write file: M-21_XXXXXXXXXXXX_AD011/reformatted_identifiers5h01wxl8_decompressed_wkysaz79_M-21_XXXXXXXXXXXX_AD011_L001_R1

Work dir:
  /home/subudhak/Documents/DRT_Data/work/e1/de3869f6cbd6bcfd6fed31adf81753

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

Besides when I use Kneaddata using a while loop, it runs but doesn’t remove contaminant reads i.e. number of reads before and after trimming and filtering remains same. We know for sure that these samples contain human reads. Also, while it gives output in the defined directory, it also prints the same error as was observed when running via nextflow.

while read p
 do
n=$(echo $p | xargs -n 1 basename | awk -F '_R1' '{print $1}')
/home/subudhak/miniconda3/envs/metagenomics-tools/bin/kneaddata -i1 $p -i2 data/${n}_R2.fastq.gz --bypass-trim -db resources/human_bwt/ --run-trf --threads 5 --sequencer-source none --output-prefix $n -o results/kneaddata_output/${n}
done < list 

Decompressing gzipped file ...

Decompressing gzipped file ...

Reformatting file sequence identifiers ...

Reformatting file sequence identifiers ...

Initial number of reads ( /home/subudhak/Documents/Dr_Turki_Data/results/kneaddata_output/M-21_XXXXXXXXXXXXXX_AD003_L001/reformatted_identifiersl0w1n60e_decompressed_fors9lqh_M-21_XXXXXXXXXXXXXX_AD003_L001_R1 ): 22304239.0
Initial number of reads ( /home/subudhak/Documents/Dr_Turki_Data/results/kneaddata_output/M-21_XXXXXXXXXXXXXX_AD003_L001/reformatted_identifierspsxaq55d_decompressed_dq30qktz_M-21_XXXXXXXXXXXXXX_AD003_L001_R2 ): 22304239.0
Bypass trimming
Total reads after trimming ( /home/subudhak/Documents/Dr_Turki_Data/results/kneaddata_output/M-21_XXXXXXXXXXXXXX_AD003_L001/reformatted_identifiersl0w1n60e_decompressed_fors9lqh_M-21_XXXXXXXXXXXXXX_AD003_L001_R1 ): 22304239.0
Total reads after trimming ( /home/subudhak/Documents/Dr_Turki_Data/results/kneaddata_output/M-21_XXXXXXXXXXXXXX_AD003_L001/reformatted_identifierspsxaq55d_decompressed_dq30qktz_M-21_XXXXXXXXXXXXXX_AD003_L001_R2 ): 22304239.0
ERROR: Unable to write file: /home/subudhak/Documents/Dr_Turki_Data/results/kneaddata_output/M-21_XXXXXXXXXXXXXX_AD003_L001/reformatted_identifiersl0w1n60e_decompressed_fors9lqh_M-21_XXXXXXXXXXXXXX_AD003_L001_R1

--run-fastqc-end doesnt run fastqc in the end either.

Note: I installed Kneaddata using Conda.

Hi @rohit_satyam,

Apologies for the delay. The kneaddata is erroring out for you. Can you do the following changes and see if it works for you please?

  • Remove --sequencer-source none flag. It will be none is we --bypass-trim. This is only relavant for Trimmomatic step but we are passing --bypass-trim to skip Trimmomatic.
  • Use the following kneaddata cmd alone without Nextflow to see if it works.
/home/subudhak/miniconda3/envs/metagenomics-tools/bin/kneaddata \
-i1 M-21_XXXXXXXXXXXX_AD011_L001_R1.fastq.gz \
-i2 M-21_XXXXXXXXXXXX_AD011_L001_R2.fastq.gz \
--bypass-trim 	-db /home/subudhak/Documents/DRT_Data/resources/human_bwt/ \
--run-trf \
--threads 5 \
--output-prefix M-21_XXXXXXXXXXXX_AD011 \
-o M-21_XXXXXXXXXXXX_AD011 

When using the while loop, I think same kneaddata error is happening. I would highly recommend running kneaddata alone to debug the issue first.

Regards,
Sagun

Hi

I am still getting the same error while running without while loop. I am running this on my local machine so no reading or writing permission issues. I ran into the same error on HPC as well.

ERROR: Unable to write file: /home/subudhak/Documents/Dr_Turki_Data/M-21_XXXXXXXXXXXX_AD011/reformatted_identifiersc6eqrnaa_decompressed_ubwpyu6t_M-21XXXXXXXXXXXX_AD010_L001_R1

Hi, @sagunmaharjann can you help me with how to proceed next?

Since I haven’t heard from you @sagunmaharjann, I will momentarily use the CDC SanatizeMe to remove host genomic DNA. However, I would love to hear if this is a bug in Kneaddata and is being resolved so that I can use it in future.

I have the same error. Do you have a solution now?

I am having the same error. It doesn’t work even when I add/remove --run-trf.

My version is 0.12.0 from the pypi channel.

This is the command:

kneaddata -i1 trimmed/LKRNA002-51_S51_L004_R1.fastq.gz  -i2 trimmed/LKRNA002-51_S51_L004_R1.fastq.gz -o kneaddata/LKRNA002-51_S51_L004              -db $SHARED/Databases/KneadData/human_transcriptome/             -db $SHARED/Databases/KneadData/bact_rrna_db             -v --max-memory 20000m             --bypass-trim --run-trf             -p 10 -t 10             --log logs/kneadata/LKRNA002-51_S51_4.log

This is the log file:

02/02/2023 03:00:41 PM - kneaddata.knead_data - INFO: Running kneaddata v0.12.0
02/02/2023 03:00:41 PM - kneaddata.knead_data - INFO: Output files will be written to: DATADIR/kneaddata/LKRNA002-51_S51_L002
02/02/2023 03:00:41 PM - kneaddata.knead_data - DEBUG: Running with the following arguments: 
verbose = True
input1 = trimmed/LKRNA002-51_S51_L002_R1.fastq.gz
input2 = trimmed/LKRNA002-51_S51_L002_R1.fastq.gz
unpaired = None
output_dir = DATADIR/kneaddata/LKRNA002-51_S51_L002
scratch_dir = 
reference_db = SHAREDDIR/Databases/KneadData/human_transcriptome/human_hg38_refMrna SHAREDDIR/Databases/KneadData/bact_rrna_db/SILVA_128_LSUParc_SSUParc_ribosomal_RNA
bypass_trim = True
output_prefix = LKRNA002-51_S51_L002_R1_kneaddata
threads = 10
processes = 10
trimmomatic_quality_scores = -phred33
bmtagger = False
bypass_trf = False
run_trf = False
fastqc_start = True
fastqc_end = True
store_temp_output = False
remove_intermediate_output = False
cat_final_output = False
log_level = DEBUG
log = logs/kneadata/LKRNA002-51_S51_2.log
trimmomatic_path = None
run_trim_repetitive = False
max_memory = 20000m
trimmomatic_options = None
sequencer_source = NexteraPE
bowtie2_path = /SHAREDDIR/Bin/bowtie2
bowtie2_options = --very-sensitive-local --phred33
decontaminate_pairs = strict
reorder = False
serial = False
bmtagger_path = None
trf_path = MAMBA/envs/dfu-snakemake/bin/trf
match = 2
mismatch = 7
delta = 7
pm = 80
pi = 10
minscore = 50
maxperiod = 500
fastqc_path = MAMBA/envs/dfu-snakemake/bin/fastqc
remove_temp_output = True
input = DATADIR/trimmed/LKRNA002-51_S51_L002_R1.fastq.gz DATADIR/trimmed/LKRNA002-51_S51_L002_R1.fastq.gz
discordant = True

02/02/2023 03:00:41 PM - kneaddata.utilities - INFO: Decompressing gzipped file ...
02/02/2023 03:00:41 PM - kneaddata.utilities - INFO: Decompressed file created: DATADIR/kneaddata/LKRNA002-51_S51_L002/decompressed_c4t92f3e_LKRNA002-51_S51_L002_R1.fastq
02/02/2023 03:00:41 PM - kneaddata.utilities - INFO: Decompressing gzipped file ...
02/02/2023 03:00:41 PM - kneaddata.utilities - INFO: Decompressed file created: DATADIR/kneaddata/LKRNA002-51_S51_L002/decompressed_l44fzy1c_LKRNA002-51_S51_L002_R1.fastq
02/02/2023 03:00:41 PM - kneaddata.utilities - INFO: Reformatting file sequence identifiers ...
02/02/2023 03:00:41 PM - kneaddata.utilities - INFO: Reformatting file sequence identifiers ...
02/02/2023 03:00:41 PM - kneaddata.utilities - INFO: READ COUNT: raw pair1 : Initial number of reads ( DATADIR/kneaddata/LKRNA002-51_S51_L002/reformatted_identifiers501bztyx_decompressed_c4t92f3e_LKRNA002-51_S51_L002_R1 ): 994.0
02/02/2023 03:00:41 PM - kneaddata.utilities - INFO: READ COUNT: raw pair2 : Initial number of reads ( DATADIR/kneaddata/LKRNA002-51_S51_L002/reformatted_identifierscwtdf864_decompressed_l44fzy1c_LKRNA002-51_S51_L002_R1 ): 994.0
02/02/2023 03:00:41 PM - kneaddata.utilities - DEBUG: Checking input file to fastqc : DATADIR/trimmed/LKRNA002-51_S51_L002_R1.fastq.gz
02/02/2023 03:00:41 PM - kneaddata.utilities - DEBUG: Checking input file to fastqc : DATADIR/trimmed/LKRNA002-51_S51_L002_R1.fastq.gz
02/02/2023 03:00:41 PM - kneaddata.utilities - INFO: Running fastqc ... 
02/02/2023 03:00:41 PM - kneaddata.utilities - INFO: Execute command: MAMBA/envs/dfu-snakemake/bin/fastqc DATADIR/trimmed/LKRNA002-51_S51_L002_R1.fastq.gz DATADIR/trimmed/LKRNA002-51_S51_L002_R1.fastq.gz --threads 10 --outdir DATADIR/kneaddata/LKRNA002-51_S51_L002/fastqc --extract
02/02/2023 03:00:54 PM - kneaddata.utilities - DEBUG: b'Started analysis of LKRNA002-51_S51_L002_R1.fastq.gz\nAnalysis complete for LKRNA002-51_S51_L002_R1.fastq.gz\nStarted analysis of LKRNA002-51_S51_L002_R1.fastq.gz\nAnalysis complete for LKRNA002-51_S51_L002_R1.fastq.gz\n'
02/02/2023 03:00:54 PM - kneaddata.knead_data - INFO: Bypass trimming
02/02/2023 03:00:54 PM - kneaddata.utilities - INFO: READ COUNT: trimmed single : Total reads after trimming ( DATADIR/kneaddata/LKRNA002-51_S51_L002/reformatted_identifiers501bztyx_decompressed_c4t92f3e_LKRNA002-51_S51_L002_R1 ): 994.0
02/02/2023 03:00:54 PM - kneaddata.utilities - INFO: READ COUNT: trimmed single : Total reads after trimming ( DATADIR/kneaddata/LKRNA002-51_S51_L002/reformatted_identifierscwtdf864_decompressed_l44fzy1c_LKRNA002-51_S51_L002_R1 ): 994.0
ERROR: Unable to write file: /ifs/scratch/tk2829_gp/na2933/DFU/data_mini/kneaddata/LKRNA002-51_S51_L004/reformatted_identifierseqswow4r_decompressed_kj47e0yq_LKRNA002-51_S51_L004_R1

Hi, I met the same problem mentioned above, “ERROR: Unable to write file …”
When I removed the option “–bypass-trim”, all the processes go smoothly.
Hope people who met the same problem can read this.