When I was running KneadData, I encountered the following problem.
First, I downloaded the demo data for the examples and ran the program using the paired files demoR_1.fastq and demoR_2.fastq. According to the user manual, I downloaded human_transcriptome , human_genome , and mouse_C57BL as reference databases by using ‘kneaddata_database --download’. When I use the following code, the program will run normally:
kneaddata --input1 examples/demoR_1.fastq --input2 examples/demoR_2.fastq
–reference-db kneaddata_database/human_genome/
–reference-db kneaddata_database/human_transcriptome/
–reference-db kneaddata_database/mouse_C57BL/
–serial
–output demo_output
–cat-final-output
–run-fastqc-start --run-fastqc-end
–trimmomatic /software/miniconda3/envs/biobakery/share/trimmomatic-0.33-0/ --trimmomatic-options “ILLUMINACLIP:/software/miniconda3/envs/biobakery/share/trimmomatic-0.33-0/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:15 MINLEN:36”
However, once I add thread or process settings to my code, the code cannot run.
First, if I set -t (thread), it will show trf errors. The code is as follows.
kneaddata --input1 examples/demoR_1.fastq --input2 examples/demoR_2.fastq
–reference-db kneaddata_database/human_genome/
–reference-db kneaddata_database/human_transcriptome/
–reference-db kneaddata_database/mouse_C57BL/
-t 6 --serial
–output demo_output
–cat-final-output
–run-fastqc-start --run-fastqc-end
–trimmomatic /software/miniconda3/envs/biobakery/share/trimmomatic-0.33-0/ --trimmomatic-options “ILLUMINACLIP:/software/miniconda3/envs/biobakery/share/trimmomatic-0.33-0/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:15 MINLEN:36”
Some of the errors are reported as follows:
Error: Error while loading sequenceError executing: /software/miniconda3/envs/biobakery/bin/trf demo_output/demoR_1_kneaddata.trimmed.single.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat_1_temp_trf_outputs73cjcd6 2 7 7 80 10 50 500 -h -ngs
Error: Error while loading sequence
Error: Error while loading sequenceError executing: /software/miniconda3/envs/biobakery/bin/trf demo_output/demoR_1_kneaddata.trimmed.single.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat_2_temp_trf_outputh0yd6gpx 2 7 7 80 10 50 500 -h -ngs
Error executing: kneaddata_trf_parallel --input demo_output/demoR_1_kneaddata.trimmed.single.1.fasta --output demo_output/demoR_1_kneaddata.trimmed.single.1.fasta.trf.parameters.2.7.7.80.10.50.500.dat --trf-path /software/miniconda3/envs/biobakery/bin/trf --trf-options ‘2 7 7 80 10 50 500 -h -ngs’ --nproc 6
Similarly, if I try to set - p, there will be errors return of bowtie2. The code is as follows.
kneaddata --input1 examples/demoR_1.fastq --input2 examples/demoR_2.fastq
–reference-db kneaddata_database/human_genome/
–reference-db kneaddata_database/human_transcriptome/
–reference-db kneaddata_database/mouse_C57BL/
-p 5 --serial
–output demo_output
–cat-final-output
–run-fastqc-start --run-fastqc-end
–trimmomatic /software/miniconda3/envs/biobakery/share/trimmomatic-0.33-0/ --trimmomatic-options “ILLUMINACLIP:/software/miniconda3/envs/biobakery/share/trimmomatic-0.33-0/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:15 MINLEN:36”
Some of the errors are reported as follows:
kneaddata.utilities - CRITICAL: Error executing: kneaddata_bowtie2_discordant_pairs --bowtie2 /software/miniconda3/envs/biobakery/bin/bowtie2 --threads 1 -x kneaddata_database/mouse_C57BL/mouse_C57BL_6NJ --mode strict --bowtie2-options “–very-sensitive-local --phred33” -1/demo_output/demoR_1_kneaddata_human_hg38_refMrna_bowtie2_paired_clean_1.fastq -2/demo_output/demoR_1_kneaddata_human_hg38_refMrna_bowtie2_paired_clean_2.fastq --un-pair/demo_output/demoR_1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_clean_%.fastq --al-pair/demo_output/demoR_1_kneaddata_mouse_C57BL_6NJ_bowtie2_paired_contam_%.fastq -U/demo_output/demoR_1_kneaddata_human_hg38_refMrna_bowtie2_unmatched_1_clean.fastq,/demo_output/demoR_1_kneaddata_human_hg38_refMrna_bowtie2_unmatched_2_clean.fastq --un-single /demo_output/demoR_1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched_%clean.fastq --al-single/demo_output/demoR_1_kneaddata_mouse_C57BL_6NJ_bowtie2_unmatched%_contam.fastq -S /dev/null
In general, according to the results of running the demo data, if the - t parameter is set, the trf reports errors; If the - p parameter is set, bowtie2 reports errors. If the - t - p parameters are set at the same time, errors will be reported by trf (demo data) or bowtie2(running the sequencing data from real samples). It works only when the -t and -p parameters are not set, but obviously, this does not work for practical applications. I have tried to run paired-ended sequencing files from a real sample without the -t and -p parameters, and it took almost 9 hours. I really hope to solve this situation, but I don’t know what the problem is.
I put the three log files generated by running the demo data and packages in the environment. I really hope to get your help. Thank you very much for your kind consideration, and I am looking forward to your reply.
non_set-t-p.log.txt (27.3 KB)
set-p.log.txt (22.6 KB)
set-t.log.txt (10.0 KB)
packages in environment.txt (9.7 KB)