All paired-end read unmatched

Kneaddata (v0.10.0) result

I checked the log file, seemed as no error :

12/10/2021 04:46:24 PM - kneaddata.knead_data - INFO: Running kneaddata v0.10.0
12/10/2021 04:46:24 PM - kneaddata.knead_data - INFO: Output files will be written to: /public4/home/sc56690/data/temp/qc
12/10/2021 04:46:24 PM - kneaddata.knead_data - DEBUG: Running with the following arguments: 
verbose = True
bypass_trf = True
bmtagger_path = None
minscore = 50
bowtie2_path = /public4/home/sc56690/.conda/envs/humann2/bin/bowtie2
maxperiod = 500
discordant = True
serial = False
fastqc_start = False
store_temp_output = False
cat_final_output = False
log_level = DEBUG
log = /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.log
sequencer_source = NexteraPE
max_memory = 500m
remove_intermediate_output = True
fastqc_path = None
output_dir = /public4/home/sc56690/data/temp/qc
trf_path = None
remove_temp_output = True
reference_db = /public4/home/sc56690/db/kneaddata/human_genome/Homo_sapiens
input = /public4/home/sc56690/data/seq/1-1106_FDSW202359622-1r_1.fq /public4/home/sc56690/data/seq/1-1106_FDSW202359622-1r_2.fq
decontaminate_pairs = strict
reorder = False
pm = 80
trimmomatic_path = /public4/home/sc56690/.conda/envs/humann2/share/trimmomatic/trimmomatic.jar
run_trf = False
mismatch = 7
threads = 60
delta = 7
bowtie2_options = --very-sensitive --dovetail --phred33
bypass_trim = False
processes = 1
pi = 10
trimmomatic_quality_scores = -phred33
fastqc_end = False
trimmomatic_options = ILLUMINACLIP:/public4/home/sc56690/.conda/envs/humann2/share/trimmomatic/adapters/TruSeq2-PE.fa:2:40:15 SLIDINGWINDOW:4:20 MINLEN:50
output_prefix = 1-1106_FDSW202359622-1r_1_kneaddata
match = 2
bmtagger = False
run_trim_repetitive = False

12/10/2021 04:46:24 PM - kneaddata.utilities - INFO: Reformatting file sequence identifiers ...
12/10/2021 04:47:48 PM - kneaddata.utilities - INFO: Reformatting file sequence identifiers ...
12/10/2021 04:49:14 PM - kneaddata.utilities - INFO: Reordering read identifiers ...
12/10/2021 05:09:34 PM - kneaddata.utilities - INFO: READ COUNT: raw pair1 : Initial number of reads ( /public4/home/sc56690/data/temp/qc/reordered_QejgrJ_reformatted_identifiersxZBJaz_1-1106_FDSW202359622-1r_1 ): 36029417
12/10/2021 05:09:46 PM - kneaddata.utilities - INFO: READ COUNT: raw pair2 : Initial number of reads ( /public4/home/sc56690/data/temp/qc/reordered_EY7eUu_reformatted_identifiersL3KgSn_1-1106_FDSW202359622-1r_2 ): 36029417
12/10/2021 05:09:46 PM - kneaddata.utilities - DEBUG: Checking input file to Trimmomatic : /public4/home/sc56690/data/temp/qc/reordered_QejgrJ_reformatted_identifiersxZBJaz_1-1106_FDSW202359622-1r_1
12/10/2021 05:09:46 PM - kneaddata.utilities - DEBUG: Checking input file to Trimmomatic : /public4/home/sc56690/data/temp/qc/reordered_EY7eUu_reformatted_identifiersL3KgSn_1-1106_FDSW202359622-1r_2
12/10/2021 05:09:46 PM - kneaddata.utilities - INFO: Running Trimmomatic ... 
12/10/2021 05:09:46 PM - kneaddata.utilities - INFO: Execute command: java -Xmx500m -jar /public4/home/sc56690/.conda/envs/humann2/share/trimmomatic/trimmomatic.jar PE -threads 60 -phred33 /public4/home/sc56690/data/temp/qc/reordered_QejgrJ_reformatted_identifiersxZBJaz_1-1106_FDSW202359622-1r_1 /public4/home/sc56690/data/temp/qc/reordered_EY7eUu_reformatted_identifiersL3KgSn_1-1106_FDSW202359622-1r_2 /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.1.fastq /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.single.1.fastq /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.2.fastq /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.single.2.fastq ILLUMINACLIP:/public4/home/sc56690/.conda/envs/humann2/share/trimmomatic/adapters/TruSeq2-PE.fa:2:40:15 SLIDINGWINDOW:4:20 MINLEN:50
12/10/2021 05:12:07 PM - kneaddata.utilities - DEBUG: TrimmomaticPE: Started with arguments:
-threads 60 -phred33 /public4/home/sc56690/data/temp/qc/reordered_QejgrJ_reformatted_identifiersxZBJaz_1-1106_FDSW202359622-1r_1 /public4/home/sc56690/data/temp/qc/reordered_EY7eUu_reformatted_identifiersL3KgSn_1-1106_FDSW202359622-1r_2 /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.1.fastq /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.single.1.fastq /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.2.fastq /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.single.2.fastq ILLUMINACLIP:/public4/home/sc56690/.conda/envs/humann2/share/trimmomatic/adapters/TruSeq2-PE.fa:2:40:15 SLIDINGWINDOW:4:20 MINLEN:50
Using PrefixPair: 'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG'
Using Long Clipping Sequence: 'TTTTTTTTTTAATGATACGGCGACCACCGAGATCTACAC'
Using Long Clipping Sequence: 'TTTTTTTTTTCAAGCAGAAGACGGCATACGA'
Using Long Clipping Sequence: 'CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 6 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Read Pairs: 36029417 Both Surviving: 34592816 (96.01%) Forward Only Surviving: 527242 (1.46%) Reverse Only Surviving: 603448 (1.67%) Dropped: 305911 (0.85%)
TrimmomaticPE: Completed successfully

12/10/2021 05:12:07 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.1.fastq
12/10/2021 05:12:07 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.single.1.fastq
12/10/2021 05:12:07 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.2.fastq
12/10/2021 05:12:07 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.single.2.fastq
12/10/2021 05:12:20 PM - kneaddata.utilities - INFO: READ COUNT: trimmed pair1 : Total reads after trimming ( /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.1.fastq ): 34592816
12/10/2021 05:12:34 PM - kneaddata.utilities - INFO: READ COUNT: trimmed pair2 : Total reads after trimming ( /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.2.fastq ): 34592816
12/10/2021 05:12:34 PM - kneaddata.utilities - INFO: READ COUNT: trimmed orphan1 : Total reads after trimming 
( /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.single.1.fastq ): 527242
12/10/2021 05:12:34 PM - kneaddata.utilities - INFO: READ COUNT: trimmed orphan2 : Total reads after trimming 
( /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.single.2.fastq ): 603448
12/10/2021 05:12:34 PM - kneaddata.run - INFO: Decontaminating ...
12/10/2021 05:12:35 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.1.fastq
12/10/2021 05:12:35 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.2.fastq
12/10/2021 05:12:35 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.single.1.fastq
12/10/2021 05:12:35 PM - kneaddata.utilities - DEBUG: Checking input file to bowtie2 : /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.single.2.fastq
12/10/2021 05:12:35 PM - kneaddata.utilities - INFO: Running bowtie2 ... 
12/10/2021 05:12:35 PM - kneaddata.utilities - INFO: Execute command: kneaddata_bowtie2_discordant_pairs --bowtie2 /public4/home/sc56690/.conda/envs/humann2/bin/bowtie2 --threads 60 -x /public4/home/sc56690/db/kneaddata/human_genome/Homo_sapiens --mode strict --bowtie2-options "--very-sensitive --dovetail --phred33" -1 /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.1.fastq -2 /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.2.fastq --un-pair /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_Homo_sapiens_bowtie2_paired_clean_%.fastq --al-pair /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_Homo_sapiens_bowtie2_paired_contam_%.fastq -U /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.single.1.fastq,/public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata.trimmed.single.2.fastq --un-single /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_Homo_sapiens_bowtie2_unmatched_%_clean.fastq --al-single /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_Homo_sapiens_bowtie2_unmatched_%_contam.fastq -S /dev/null
12/10/2021 05:29:20 PM - kneaddata.utilities - DEBUG: 70316322 reads; of these:
  70316322 (100.00%) were unpaired; of these:
    70313909 (100.00%) aligned 0 times
    579 (0.00%) aligned exactly 1 time
    1834 (0.00%) aligned >1 times
0.00% overall alignment rate
pair1_aligned : 0
pair2_aligned : 0
orphan1_unaligned : 35119566
orphan2_unaligned : 35194343
orphan2_aligned : 1921
pair2_unaligned : 0
pair1_unaligned : 0
orphan1_aligned : 492

12/10/2021 05:29:20 PM - kneaddata.utilities - DEBUG: Checking output file from bowtie2 : /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_Homo_sapiens_bowtie2_paired_clean_1.fastq
12/10/2021 05:29:20 PM - kneaddata.utilities - DEBUG: Checking output file from bowtie2 : /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_Homo_sapiens_bowtie2_paired_clean_2.fastq
12/10/2021 05:29:21 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_Homo_sapiens_bowtie2_paired_contam_1.fastq ) : 0
12/10/2021 05:29:21 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_Homo_sapiens_bowtie2_paired_contam_2.fastq ) : 0
12/10/2021 05:29:21 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_Homo_sapiens_bowtie2_unmatched_1_contam.fastq ) : 492
12/10/2021 05:29:21 PM - kneaddata.run - INFO: Total contaminate sequences in file ( /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_Homo_sapiens_bowtie2_unmatched_2_contam.fastq ) : 1921
12/10/2021 05:29:21 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated Homo_sapiens pair1 : Total reads after removing those found in reference database ( /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_Homo_sapiens_bowtie2_paired_clean_1.fastq ): 0
12/10/2021 05:29:21 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated Homo_sapiens pair2 : Total reads after removing those found in reference database ( /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_Homo_sapiens_bowtie2_paired_clean_2.fastq ): 0
12/10/2021 05:29:21 PM - kneaddata.utilities - INFO: READ COUNT: final pair1 : Total reads after merging results from multiple databases ( /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_paired_1.fastq ): 0
12/10/2021 05:29:21 PM - kneaddata.utilities - INFO: READ COUNT: final pair2 : Total reads after merging results from multiple databases ( /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_paired_2.fastq ): 0
12/10/2021 05:29:21 PM - kneaddata.utilities - WARNING: Unable to remove file: /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_Homo_sapiens_bowtie2_paired_clean_1.fastq
12/10/2021 05:29:21 PM - kneaddata.utilities - WARNING: Unable to remove file: /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_Homo_sapiens_bowtie2_paired_clean_2.fastq
12/10/2021 05:29:33 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated Homo_sapiens orphan1 : Total reads after removing those found in reference database ( /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_Homo_sapiens_bowtie2_unmatched_1_clean.fastq ): 35119566
12/10/2021 05:29:46 PM - kneaddata.utilities - INFO: READ COUNT: final orphan1 : Total reads after merging results from multiple databases ( /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_unmatched_1.fastq ): 35119566
12/10/2021 05:29:46 PM - kneaddata.utilities - WARNING: Unable to remove file: /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_Homo_sapiens_bowtie2_unmatched_1_clean.fastq
12/10/2021 05:30:02 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated Homo_sapiens orphan2 : Total reads after removing those found in reference database ( /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_Homo_sapiens_bowtie2_unmatched_2_clean.fastq ): 35194343
12/10/2021 05:30:14 PM - kneaddata.utilities - INFO: READ COUNT: final orphan2 : Total reads after merging results from multiple databases ( /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_unmatched_2.fastq ): 35194343
12/10/2021 05:30:14 PM - kneaddata.utilities - WARNING: Unable to remove file: /public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_Homo_sapiens_bowtie2_unmatched_2_clean.fastq
12/10/2021 05:30:24 PM - kneaddata.knead_data - INFO: 
Final output files created: 
/public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_paired_1.fastq
/public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_paired_2.fastq
/public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_unmatched_1.fastq
/public4/home/sc56690/data/temp/qc/1-1106_FDSW202359622-1r_1_kneaddata_unmatched_2.fastq
1 Like

This happens. I have no idea what the design choice was for this.
They actually take all the reads and put them into one file. You’ll also see singletons/orphans in your data too, as an artifact of trimming and filtering.

The same happened to my samples. Don’t know if it’s a bug or expected output.

This is the case because kneaddata does not recognize sequence labels. When I changed to other sequences with a different label, the problem was solved

What label do you refer to? Could you give me an example to illustrate it? Thank you

1 Like

Could anyone who has solved this problem please elaborate? I am having it too (paired reads are not being recognized as paired) and have seen many threads without a solution beyond “sequence label” problem. What do I need to change my sequence labels to? What is the problem with them that needs to be fixed?

Just in case someone else comes across this in the future… It seems like an error in the coding of the “-strict” flag (which is also the default). Adding the " --decontaminate-pairs=lenient" will largely fix the problem (though you now will have that particular effect on your results).

I think that this issue occurs since that Illumina’s read pair indicator was seperated by a blank within the seq identifier line.
e.g.)
@A00930:290:HC5FGDSX3:3:1101:21685:1000 1:N:0:TGCGGCGT+TACCGAGG

Try a command
sed 's/ 1.*/\/1/g' < sample.R1.fastq > new.R1.fastq and
sed 's/ 2.*/\/2/g' < sample.R2.fastq > new.R2.fastq

I referred to this query.

Great Luck :blush:

1 Like

Isn’t the “1.” in your sed code supposed to be “1:”?

You’re right. This is a typo.
Thanks for the correction :wink:

no prob :slightly_smiling_face:

so if my seq id went from:

@A00419:585:HJTJLDSX3:3:1101:1542:1000 1:N:0:CACAGACT+NGGTACAG

to:

@A00419:585:HJTJLDSX3:3:1101:1542:1000/1N:0:CACAGACT+NGGTACAG

it should work? or do I need to add that colon back in… because I still get 0 byte output files after kneaddata telling me that something else is going on…

This also happens to me when I use kneaddata (v0.12.0). When I changed the kneaddata to v0.10.0, the problem was solved.

输入zcat *.fq.gz|head 查看你序列的标识符,比如我的是:
@HWI-ST1276:71:C1162ACXX:1:1101:1208:2458 1:N:0:CGATGT
NAAGAACACGTTCGGTCACCTCAGCACACTTGTGAATGTCATGGGATCCAT
+
#55???BBBBB?BA@DEEFFCFFHHFFCFFHHHHHHHFAE0ECFFD/AEHH

软件似乎不能识别“ 1:N:0”这段字符,从而导致错误,我使用了sed命令来删掉fastq文件中的这段字符:
sed -i ‘s/ 1:N:0//g’ *_1.fq
sed -i ‘s/ 2:N:0//g’ *_2.fq
重新运行程序后发现paired结果不再全为0

对了,我的kneaddata版本是0.12,在anaconda3中创建环境安装

论坛里的其他方法我都尝试过了,像删掉字符中间的空格,在结尾添加\1或者/1,把“ 1:N:0”换成“/1:N:0”或者“\1:N:0”等等都无效,一气之下删掉“ 1:N:0”,结果歪打正着,但不知道这样的结果合不合理0v0