When running the Kneaddata, the output files have large and ugly names without extension.
M-21_XXXXXXXXX_AD011.log
reformatted_identifiers5h01wxl8_decompressed_wkysaz79_M-21_XXXXXXXXX_AD011_L001_R1
reformatted_identifiersc3g8pypf_decompressed_67a9qw7h_M-21_XXXXXXXXX_AD011_L001_R2
Also, I am unable to run Kneaddata using Nextflow.
Command being used is as follows:
process KneadData{
publishDir "$params.outdir", mode: 'copy'
maxForks params.jobs
input:
tuple val(sid), path(reads)
output:
path "*"
script:
"""
mkdir ${reads[0].simpleName.replaceAll(/_L001_R1/, "")}
/home/subudhak/miniconda3/envs/metagenomics-tools/bin/kneaddata -i1 ${reads[0]} -i2 ${reads[1]} \
--bypass-trim \
-db ${params.db} \
--run-trf \
--threads 5 \
--sequencer-source none\
--output-prefix ${reads[0].simpleName.replaceAll(/_L001_R1/, "")} \
-o ${reads[0].simpleName.replaceAll(/_L001_R1/, "")}
"""
}
And the error is
N E X T F L O W ~ version 22.04.5
Launching `kneddata.nf` [festering_hugle] DSL2 - revision: a7e525a8a0
executor > local (2)
[16/caa230] process > KneadData (9) [ 0%] 0 of 12
Error executing process > 'KneadData (1)'
Caused by:
Process `KneadData (1)` terminated with an error exit status (1)
Command executed:
mkdir M-21_XXXXXXXXXXXX_AD011
/home/subudhak/miniconda3/envs/metagenomics-tools/bin/kneaddata -i1 M-21_XXXXXXXXXXXX_AD011_L001_R1.fastq.gz -i2 M-21_XXXXXXXXXXXX_AD011_L001_R2.fastq.gz --bypass-trim -db /home/subudhak/Documents/DRT_Data/resources/human_bwt --run-trf --threads 5 --sequencer-source none --output-prefix M-21_XXXXXXXXXXXX_AD011 -o M-21_XXXXXXXXXXXX_AD011
Command exit status:
1
Command output:
Decompressing gzipped file ...
Decompressing gzipped file ...
Reformatting file sequence identifiers ...
Reformatting file sequence identifiers ...
Initial number of reads ( M-21_XXXXXXXXXXXX_AD011/reformatted_identifiers5h01wxl8_decompressed_wkysaz79_M-21_XXXXXXXXXXXX_AD011_L001_R1 ): 22252867.0
executor > local (2)
[16/caa230] process > KneadData (9) [ 9%] 1 of 11, failed: 1
Error executing process > 'KneadData (1)'
Caused by:
Process `KneadData (1)` terminated with an error exit status (1)
Command executed:
mkdir M-21_XXXXXXXXXXXX_AD011
/home/subudhak/miniconda3/envs/metagenomics-tools/bin/kneaddata -i1 M-21_XXXXXXXXXXXX_AD011_L001_R1.fastq.gz -i2 M-21_XXXXXXXXXXXX_AD011_L001_R2.fastq.gz --bypass-trim -db /home/subudhak/Documents/DRT_Data/resources/human_bwt --run-trf ubudhak--threads 5 r_Tu--sequencer-source nonef--output-prefix M-21_XXXXXXXXXXXX_AD011 -o M-21_XXXXXXXXXXXX_AD011
Command exit status:
1
Command output:
Decompressing gzipped file ...
Decompressing gzipped file ...
Reformatting file sequence identifiers ...
Reformatting file sequence identifiers ...
Initial number of reads ( M-21_XXXXXXXXXXXX_AD011/reformatted_identifiers5h01wxl8_decompressed_wkysaz79_M-21_XXXXXXXXXXXX_AD011_L001_R1 ): 22252867.0
Initial number of reads ( M-21_XXXXXXXXXXXX_AD011/reformatted_identifiersc3g8pypf_decompressed_67a9qw7h_M-21_XXXXXXXXXXXX_AD011_L001_R2 ): 22252867.0
Bypass trimming
Total reads after trimming ( M-21_XXXXXXXXXXXX_AD011/reformatted_identifiers5h01wxl8_decompressed_wkysaz79_M-21_XXXXXXXXXXXX_AD011_L001_R1 ): 22252867.0
Total reads after trimming ( M-21_XXXXXXXXXXXX_AD011/reformatted_identifiersc3g8pypf_decompressed_67a9qw7h_M-21_XXXXXXXXXXXX_AD011_L001_R2 ): 22252867.0
Command error:
ERROR: Unable to write file: M-21_XXXXXXXXXXXX_AD011/reformatted_identifiers5h01wxl8_decompressed_wkysaz79_M-21_XXXXXXXXXXXX_AD011_L001_R1
Work dir:
/home/subudhak/Documents/DRT_Data/work/e1/de3869f6cbd6bcfd6fed31adf81753
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
Besides when I use Kneaddata using a while loop, it runs but doesn’t remove contaminant reads i.e. number of reads before and after trimming and filtering remains same. We know for sure that these samples contain human reads. Also, while it gives output in the defined directory, it also prints the same error as was observed when running via nextflow
.
while read p
do
n=$(echo $p | xargs -n 1 basename | awk -F '_R1' '{print $1}')
/home/subudhak/miniconda3/envs/metagenomics-tools/bin/kneaddata -i1 $p -i2 data/${n}_R2.fastq.gz --bypass-trim -db resources/human_bwt/ --run-trf --threads 5 --sequencer-source none --output-prefix $n -o results/kneaddata_output/${n}
done < list
Decompressing gzipped file ...
Decompressing gzipped file ...
Reformatting file sequence identifiers ...
Reformatting file sequence identifiers ...
Initial number of reads ( /home/subudhak/Documents/Dr_Turki_Data/results/kneaddata_output/M-21_XXXXXXXXXXXXXX_AD003_L001/reformatted_identifiersl0w1n60e_decompressed_fors9lqh_M-21_XXXXXXXXXXXXXX_AD003_L001_R1 ): 22304239.0
Initial number of reads ( /home/subudhak/Documents/Dr_Turki_Data/results/kneaddata_output/M-21_XXXXXXXXXXXXXX_AD003_L001/reformatted_identifierspsxaq55d_decompressed_dq30qktz_M-21_XXXXXXXXXXXXXX_AD003_L001_R2 ): 22304239.0
Bypass trimming
Total reads after trimming ( /home/subudhak/Documents/Dr_Turki_Data/results/kneaddata_output/M-21_XXXXXXXXXXXXXX_AD003_L001/reformatted_identifiersl0w1n60e_decompressed_fors9lqh_M-21_XXXXXXXXXXXXXX_AD003_L001_R1 ): 22304239.0
Total reads after trimming ( /home/subudhak/Documents/Dr_Turki_Data/results/kneaddata_output/M-21_XXXXXXXXXXXXXX_AD003_L001/reformatted_identifierspsxaq55d_decompressed_dq30qktz_M-21_XXXXXXXXXXXXXX_AD003_L001_R2 ): 22304239.0
ERROR: Unable to write file: /home/subudhak/Documents/Dr_Turki_Data/results/kneaddata_output/M-21_XXXXXXXXXXXXXX_AD003_L001/reformatted_identifiersl0w1n60e_decompressed_fors9lqh_M-21_XXXXXXXXXXXXXX_AD003_L001_R1
--run-fastqc-end
doesnt run fastqc in the end either.
Note: I installed Kneaddata using Conda.