Long ugly file names of output files by Kneaddata

When running the Kneaddata, the output files have large and ugly names without extension.

M-21_XXXXXXXXX_AD011.log
reformatted_identifiers5h01wxl8_decompressed_wkysaz79_M-21_XXXXXXXXX_AD011_L001_R1
reformatted_identifiersc3g8pypf_decompressed_67a9qw7h_M-21_XXXXXXXXX_AD011_L001_R2

Also, I am unable to run Kneaddata using Nextflow.
Command being used is as follows:

process KneadData{
publishDir "$params.outdir", mode: 'copy'
maxForks params.jobs
        input:
        tuple val(sid), path(reads)
        output:
        path "*"
        script:
        """
mkdir ${reads[0].simpleName.replaceAll(/_L001_R1/, "")}
/home/subudhak/miniconda3/envs/metagenomics-tools/bin/kneaddata -i1 ${reads[0]} -i2 ${reads[1]} \
        --bypass-trim \
        -db ${params.db}  \
        --run-trf \
        --threads 5 \
        --sequencer-source none\
        --output-prefix ${reads[0].simpleName.replaceAll(/_L001_R1/, "")} \
        -o ${reads[0].simpleName.replaceAll(/_L001_R1/, "")}
        """
}

And the error is

N E X T F L O W  ~  version 22.04.5
Launching `kneddata.nf` [festering_hugle] DSL2 - revision: a7e525a8a0
executor >  local (2)
[16/caa230] process > KneadData (9) [  0%] 0 of 12
Error executing process > 'KneadData (1)'

Caused by:
  Process `KneadData (1)` terminated with an error exit status (1)

Command executed:

  mkdir M-21_XXXXXXXXXXXX_AD011
  /home/subudhak/miniconda3/envs/metagenomics-tools/bin/kneaddata -i1 M-21_XXXXXXXXXXXX_AD011_L001_R1.fastq.gz -i2 M-21_XXXXXXXXXXXX_AD011_L001_R2.fastq.gz 	--bypass-trim 	-db /home/subudhak/Documents/DRT_Data/resources/human_bwt  	--run-trf 	--threads 5 	--sequencer-source none	--output-prefix M-21_XXXXXXXXXXXX_AD011 	-o M-21_XXXXXXXXXXXX_AD011

Command exit status:
  1

Command output:
  Decompressing gzipped file ...
  
  Decompressing gzipped file ...
  
  Reformatting file sequence identifiers ...
  
  Reformatting file sequence identifiers ...
  
  Initial number of reads ( M-21_XXXXXXXXXXXX_AD011/reformatted_identifiers5h01wxl8_decompressed_wkysaz79_M-21_XXXXXXXXXXXX_AD011_L001_R1 ): 22252867.0
executor >  local (2)
[16/caa230] process > KneadData (9) [  9%] 1 of 11, failed: 1
Error executing process > 'KneadData (1)'

Caused by:
  Process `KneadData (1)` terminated with an error exit status (1)

Command executed:

  mkdir M-21_XXXXXXXXXXXX_AD011
  /home/subudhak/miniconda3/envs/metagenomics-tools/bin/kneaddata -i1 M-21_XXXXXXXXXXXX_AD011_L001_R1.fastq.gz -i2 M-21_XXXXXXXXXXXX_AD011_L001_R2.fastq.gz 	--bypass-trim 	-db /home/subudhak/Documents/DRT_Data/resources/human_bwt  	--run-trf ubudhak--threads 5 r_Tu--sequencer-source nonef--output-prefix M-21_XXXXXXXXXXXX_AD011 	-o M-21_XXXXXXXXXXXX_AD011

Command exit status:
  1

Command output:
  Decompressing gzipped file ...
  
  Decompressing gzipped file ...
  
  Reformatting file sequence identifiers ...
  
  Reformatting file sequence identifiers ...
  
  Initial number of reads ( M-21_XXXXXXXXXXXX_AD011/reformatted_identifiers5h01wxl8_decompressed_wkysaz79_M-21_XXXXXXXXXXXX_AD011_L001_R1 ): 22252867.0
  Initial number of reads ( M-21_XXXXXXXXXXXX_AD011/reformatted_identifiersc3g8pypf_decompressed_67a9qw7h_M-21_XXXXXXXXXXXX_AD011_L001_R2 ): 22252867.0
  Bypass trimming
  Total reads after trimming ( M-21_XXXXXXXXXXXX_AD011/reformatted_identifiers5h01wxl8_decompressed_wkysaz79_M-21_XXXXXXXXXXXX_AD011_L001_R1 ): 22252867.0
  Total reads after trimming ( M-21_XXXXXXXXXXXX_AD011/reformatted_identifiersc3g8pypf_decompressed_67a9qw7h_M-21_XXXXXXXXXXXX_AD011_L001_R2 ): 22252867.0

Command error:
  ERROR: Unable to write file: M-21_XXXXXXXXXXXX_AD011/reformatted_identifiers5h01wxl8_decompressed_wkysaz79_M-21_XXXXXXXXXXXX_AD011_L001_R1

Work dir:
  /home/subudhak/Documents/DRT_Data/work/e1/de3869f6cbd6bcfd6fed31adf81753

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

Besides when I use Kneaddata using a while loop, it runs but doesn’t remove contaminant reads i.e. number of reads before and after trimming and filtering remains same. We know for sure that these samples contain human reads. Also, while it gives output in the defined directory, it also prints the same error as was observed when running via nextflow.

while read p
 do
n=$(echo $p | xargs -n 1 basename | awk -F '_R1' '{print $1}')
/home/subudhak/miniconda3/envs/metagenomics-tools/bin/kneaddata -i1 $p -i2 data/${n}_R2.fastq.gz --bypass-trim -db resources/human_bwt/ --run-trf --threads 5 --sequencer-source none --output-prefix $n -o results/kneaddata_output/${n}
done < list 

Decompressing gzipped file ...

Decompressing gzipped file ...

Reformatting file sequence identifiers ...

Reformatting file sequence identifiers ...

Initial number of reads ( /home/subudhak/Documents/Dr_Turki_Data/results/kneaddata_output/M-21_XXXXXXXXXXXXXX_AD003_L001/reformatted_identifiersl0w1n60e_decompressed_fors9lqh_M-21_XXXXXXXXXXXXXX_AD003_L001_R1 ): 22304239.0
Initial number of reads ( /home/subudhak/Documents/Dr_Turki_Data/results/kneaddata_output/M-21_XXXXXXXXXXXXXX_AD003_L001/reformatted_identifierspsxaq55d_decompressed_dq30qktz_M-21_XXXXXXXXXXXXXX_AD003_L001_R2 ): 22304239.0
Bypass trimming
Total reads after trimming ( /home/subudhak/Documents/Dr_Turki_Data/results/kneaddata_output/M-21_XXXXXXXXXXXXXX_AD003_L001/reformatted_identifiersl0w1n60e_decompressed_fors9lqh_M-21_XXXXXXXXXXXXXX_AD003_L001_R1 ): 22304239.0
Total reads after trimming ( /home/subudhak/Documents/Dr_Turki_Data/results/kneaddata_output/M-21_XXXXXXXXXXXXXX_AD003_L001/reformatted_identifierspsxaq55d_decompressed_dq30qktz_M-21_XXXXXXXXXXXXXX_AD003_L001_R2 ): 22304239.0
ERROR: Unable to write file: /home/subudhak/Documents/Dr_Turki_Data/results/kneaddata_output/M-21_XXXXXXXXXXXXXX_AD003_L001/reformatted_identifiersl0w1n60e_decompressed_fors9lqh_M-21_XXXXXXXXXXXXXX_AD003_L001_R1

--run-fastqc-end doesnt run fastqc in the end either.

Note: I installed Kneaddata using Conda.

Hi @rohit_satyam,

Apologies for the delay. The kneaddata is erroring out for you. Can you do the following changes and see if it works for you please?

  • Remove --sequencer-source none flag. It will be none is we --bypass-trim. This is only relavant for Trimmomatic step but we are passing --bypass-trim to skip Trimmomatic.
  • Use the following kneaddata cmd alone without Nextflow to see if it works.
/home/subudhak/miniconda3/envs/metagenomics-tools/bin/kneaddata \
-i1 M-21_XXXXXXXXXXXX_AD011_L001_R1.fastq.gz \
-i2 M-21_XXXXXXXXXXXX_AD011_L001_R2.fastq.gz \
--bypass-trim 	-db /home/subudhak/Documents/DRT_Data/resources/human_bwt/ \
--run-trf \
--threads 5 \
--output-prefix M-21_XXXXXXXXXXXX_AD011 \
-o M-21_XXXXXXXXXXXX_AD011 

When using the while loop, I think same kneaddata error is happening. I would highly recommend running kneaddata alone to debug the issue first.

Regards,
Sagun

Hi

I am still getting the same error while running without while loop. I am running this on my local machine so no reading or writing permission issues. I ran into the same error on HPC as well.

ERROR: Unable to write file: /home/subudhak/Documents/Dr_Turki_Data/M-21_XXXXXXXXXXXX_AD011/reformatted_identifiersc6eqrnaa_decompressed_ubwpyu6t_M-21XXXXXXXXXXXX_AD010_L001_R1

Hi, @sagunmaharjann can you help me with how to proceed next?

Since I haven’t heard from you @sagunmaharjann, I will momentarily use the CDC SanatizeMe to remove host genomic DNA. However, I would love to hear if this is a bug in Kneaddata and is being resolved so that I can use it in future.

I have the same error. Do you have a solution now?