The bioBakery help forum

Kneaddata paired output ordering or interleaving

I have run kneaddata 0.7.4 on my metagenomics data. I have *paired_1.fastq and *paired_2.fastq for each sample. If I look at each of these fastq files, the reads do not appear to be in matching order (e.g. forward read 1, forward read 2, etc in file 1 and reverse read 1, reverse read 2, etc in file 2). I will post the head of each file for one sample below. How can I go about pairing these read correctly if this is the case? Can I interleave these two files and what script/program would work? I am trying to take this qc and decontaminated data into a metagenomics assembly pipeline.

Thanks!

Head of file 1:
@J00107:235:HFWNCBBXY:1:1101:2859:1173:N:0:NAGGCATG+ATAGAGAG#0/1
GTGTTATTATGATTGATGAGGCTCATGAAAGAACTCTTTACACAGACATTATTTTGGGTCTTTTAAAAAAGGTTGGTCTTTTTCACTGGGAAGTGTCATTTTACGATTTTTTTTCTGTGGCATATATCTGTCTCTTATACACATCTCCGAG
+
AAFFFJJJJJJJJJJAJJJJJJJJJJJJJJJJJJJJJJJJJJJJF-FJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJAF
@J00107:235:HFWNCBBXY:1:1101:4868:1173:N:0:NAGGCATG+ATAGAGAG#0/1
GAAGGTGGTGAGACCTTTGAGGCGTGCTGACCCACAGGTATGGTGAAAGAGATAATAGCAAAACCTTGTATTATCTGGGATAGCTTTCTTCACCTGTCTCTTATACACATCTCCGAGCCCACGAGACTAGGCATGATCTCGTATGCCGTCT
+
AAFFFJJJJJFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJFJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJAFJJJJJJJJJJFJF
@J00107:235:HFWNCBBXY:1:1101:4929:1173:N:0:NAGGCATG+ATAGAGAG#0/1
AACCTAACAAGAGGTACTGTACTAATGGAGCAATCAAAGGAAACTAATGATTGGTACTGCTTTACTCGCTTATGGCCAATGATTTAACTCCACACTATATCTACCAACATTTTTTCAAAAACACACCTTTCAGTTCATGTTAATGCTGTCT

Head of file 2:
@J00107:235:HFWNCBBXY:1:1101:13210:1173:N:0:NAGGCATG+ATAGAGAG#0/2
GATAACTTGGGAGGATCTTAACTATAAGGGTACTTGTGTTAGTAGAATAGTGGGCCTATAAAC
+
AAAFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@J00107:235:HFWNCBBXY:1:1101:13758:1173:N:0:NAGGCATG+ATAGAGAG#0/2
TCCATACCCTATGAGGTCCTTGTCCCATCCATTTAATCGATTTACACTTCGTTTCAGGGTATTTAAATGAGATTCCTACAAAGTCTTTATTCCTTAATTCTTTAGCTATTGGGCCTACTTCCAATCCCAACAAACCATTTTGCTCTAATTT
+
AAFFFJJJJJJJJJFJFJJJJJJFJJJAJJJJFJJJJJJJJJJJJJJFFJJFFJJJJJFJFFFFFJJFJJJFJJJJJJJJJJJJJJJJJJFJJJFFFJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJAJJJJ
@J00107:235:HFWNCBBXY:1:1101:16315:1173:N:0:NAGGCATG+ATAGAGAG#0/2
GTATAGTACATCATGATAAGGGGAGGGCTGGTATGGTACATAATGATAAGGGGAGGAGTAGTATCTGTCTCTTATACACATCTGACGCTGCCGACGAATAGAGAGGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAAA

Here is an abbreviated log output of kneaddata:
08/08/2020 04:05:04 PM - kneaddata.knead_data - INFO: Running kneaddata v0.7.4
08/08/2020 04:05:04 PM - kneaddata.knead_data - INFO: Output files will be written to: /nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/kneaddata_out2
08/08/2020 04:05:04 PM - kneaddata.knead_data - DEBUG: Running with the following arguments:
verbose = True
input = /nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/lane1-s001-indexN706-S502-TAGGCATG-ATAGAGAG-Sample-1_S1_L001_R1_001.fastq.gz /nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/lane1-s001-indexN706-S502-TAGGCATG-ATAGAGAG-Sample-1_S1_L001_R2_001.fastq.gz
output_dir = /nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/kneaddata_out2
reference_db = /nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/anthopleura_genome_db /nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/breviolum_minutum_db /nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/SILVA_128_LSUParc_SSUParc_ribosomal_RNA /nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/hg37dec_v0.1 /nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/human_hg38_refMrna
bypass_trim = False
output_prefix = lane1-s001-indexN706-S502-TAGGCATG-ATAGAGAG-Sample-1_S1_L001_R1_001_kneaddata
threads = 5
processes = 1
trimmomatic_quality_scores = -phred33
bmtagger = False
trf = False
fastqc_start = False
fastqc_end = False
store_temp_output = False
remove_intermediate_output = False
cat_final_output = False
log_level = DEBUG
log = /nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/kneaddata_out2/lane1-s001-indexN706-S502-TAGGCATG-ATAGAGAG-Sample-1_S1_L001_R1_001_kneaddata.log
trimmomatic_path = /nfs0/IB/Weis_Lab/kochja/opt/conda/envs/humann3.1/bin/trimmomatic-0.33.jar
max_memory = 500m
trimmomatic_options = SLIDINGWINDOW:4:25 MINLEN:50
bowtie2_path = /nfs0/IB/Weis_Lab/kochja/opt/conda/envs/humann3.1/bin/bowtie2
bowtie2_options = --very-sensitive --dovetail --phred33
no_discordant = False
reorder = False
serial = False
bmtagger_path = None
trf_path = None
match = 2
mismatch = 7
delta = 7
pm = 80
pi = 10
minscore = 50
maxperiod = 500
fastqc_path = None
remove_temp_output = True

08/08/2020 04:05:04 PM - kneaddata.utilities - INFO: Decompressing gzipped file ...
08/08/2020 04:06:37 PM - kneaddata.utilities - INFO: Decompressed file created: /nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/kneaddata_out2/decompressed_dkmn7uv3_lane1-s001-indexN706-S502-TAGGCATG-ATAGAGAG-Sample-1_S1_L001_R1_001.fastq
08/08/2020 04:06:37 PM - kneaddata.utilities - INFO: Decompressing gzipped file ...
08/08/2020 04:08:14 PM - kneaddata.utilities - INFO: Decompressed file created: /nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/kneaddata_out2/decompressed_9hod5u1j_lane1-s001-indexN706-S502-TAGGCATG-ATAGAGAG-Sample-1_S1_L001_R2_001.fastq
08/08/2020 04:08:16 PM - kneaddata.utilities - INFO: Reformatting file sequence identifiers ...
08/08/2020 04:10:29 PM - kneaddata.utilities - INFO: Reformatting file sequence identifiers ...
08/08/2020 04:15:03 PM - kneaddata.utilities - INFO: READ COUNT: raw pair1 : Initial number of reads ( /nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/kneaddata_out2/reformatted_identifiers28pjg7p9_decompressed_dkmn7uv3_lane1-s001-indexN706-S502-TAGGCATG-ATAGAGAG-Sample-1_S1_L001_R1_001 ): 17659800.0
08/08/2020 04:15:26 PM - kneaddata.utilities - INFO: READ COUNT: raw pair2 : Initial number of reads ( /nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/kneaddata_out2/reformatted_identifiersq2mceagx_decompressed_9hod5u1j_lane1-s001-indexN706-S502-TAGGCATG-ATAGAGAG-Sample-1_S1_L001_R2_001 ): 17659800.0


08/08/2020 07:46:28 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated anthopleura_genome_db orphan2 : Total reads after removing those found in reference database ( /nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/kneaddata_out2/lane1-s001-indexN706-S502-TAGGCATG-ATAGAGAG-Sample-1_S1_L001_R1_001_kneaddata_SILVA_128_LSUParc_SSUParc_ribosomal_RNA_bowtie2_unmatched_2_clean.fastq ): 355073.0
08/08/2020 07:46:37 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated breviolum_minutum_db orphan2 : Total reads after removing those found in reference database ( /nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/kneaddata_out2/lane1-s001-indexN706-S502-TAGGCATG-ATAGAGAG-Sample-1_S1_L001_R1_001_kneaddata_anthopleura_genome_db_bowtie2_unmatched_2_clean.fastq ): 801714.0
08/08/2020 07:46:41 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated SILVA_128_LSUParc_SSUParc_ribosomal_RNA orphan2 : Total reads after removing those found in reference database ( /nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/kneaddata_out2/lane1-s001-indexN706-S502-TAGGCATG-ATAGAGAG-Sample-1_S1_L001_R1_001_kneaddata_breviolum_minutum_db_bowtie2_unmatched_2_clean.fastq ): 381086.0
08/08/2020 07:46:45 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated hg37dec_v0.1 orphan2 : Total reads after removing those found in reference database ( /nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/kneaddata_out2/lane1-s001-indexN706-S502-TAGGCATG-ATAGAGAG-Sample-1_S1_L001_R1_001_kneaddata_hg37dec_v0.1_bowtie2_unmatched_2_clean.fastq ): 367332.0
08/08/2020 07:46:47 PM - kneaddata.utilities - INFO: READ COUNT: decontaminated human_hg38_refMrna orphan2 : Total reads after removing those found in reference database ( /nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/kneaddata_out2/lane1-s001-indexN706-S502-TAGGCATG-ATAGAGAG-Sample-1_S1_L001_R1_001_kneaddata_human_hg38_refMrna_bowtie2_unmatched_2_clean.fastq ): 359875.0
08/08/2020 07:46:57 PM - kneaddata.utilities - INFO: READ COUNT: final orphan2 : Total reads after merging results from multiple databases ( /nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/kneaddata_out2/lane1-s001-indexN706-S502-TAGGCATG-ATAGAGAG-Sample-1_S1_L001_R1_001_kneaddata_unmatched_2.fastq ): 115693.0
08/08/2020 07:47:04 PM - kneaddata.knead_data - INFO:
Final output files created:
/nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/kneaddata_out2/lane1-s001-indexN706-S502-TAGGCATG-ATAGAGAG-Sample-1_S1_L001_R1_001_kneaddata_paired_1.fastq
/nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/kneaddata_out2/lane1-s001-indexN706-S502-TAGGCATG-ATAGAGAG-Sample-1_S1_L001_R1_001_kneaddata_paired_2.fastq
/nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/kneaddata_out2/lane1-s001-indexN706-S502-TAGGCATG-ATAGAGAG-Sample-1_S1_L001_R1_001_kneaddata_unmatched_1.fastq
/nfs0/IB/Weis_Lab/kochja/MicrOA/raw_metagenome/raw_data/kneaddata_out2/lane1-s001-indexN706-S502-TAGGCATG-ATAGAGAG-Sample-1_S1_L001_R1_001_kneaddata_unmatched_2.fastq

Hi!
I think you should use --reorder option while running kneaddata

1 Like