Kneaddata output

Hi
I am analyzing my shot-gun metagenome sample in kneaddata. I gave the command “kneaddata --input AST2R1.fastq --input AST2R1.fastq -db $/home/plankton/Kneadata_DIR --output /home/plankton/Metagenomics_AST_Thatha/CG_DN_935 --trimmomatic /home/plankton/anaconda3/share/trimmomatic-0.39-2 --cat-final-output” and got output as

Final output files created:
/home/plankton/Metagenomics_AST_Thatha/CG_DN_935/AST2.1/AST2R1_kneaddata_paired_1.fastq
/home/plankton/Metagenomics_AST_Thatha/CG_DN_935/AST2.1/AST2R1_kneaddata_paired_2.fastq
/home/plankton/Metagenomics_AST_Thatha/CG_DN_935/AST2.1/AST2R1_kneaddata_unmatched_1.fastq
/home/plankton/Metagenomics_AST_Thatha/CG_DN_935/AST2.1/AST2R1_kneaddata_unmatched_2.fastq
/home/plankton/Metagenomics_AST_Thatha/CG_DN_935/AST2.1/AST2R1_kneaddata.fastq

So which file I have to use for the next step(Humann3)

Thanks for your valuable time

Hi @balamurugan_Sadaiapp ,

Thank you for reaching out to the biobakery Lab.

The merged file of the following two output files would be the input for the Humann3 step.

/home/plankton/Metagenomics_AST_Thatha/CG_DN_935/AST2.1/AST2R1_kneaddata_paired_1.fastq
/home/plankton/Metagenomics_AST_Thatha/CG_DN_935/AST2.1/AST2R1_kneaddata_paired_2.fastq

Regards,
Sagun

HI Sagun
Based on the Kneaddata outputs discussion, I merged the
AST6_R1_kneaddata.repeats.removed.1.fastq and AST6_R1_kneaddata.repeats.removed.1.fastq and used for humann3.
Please clarify, this is correct.

Thank you for your reply

Hi @balamurugan_Sadaiapp
Personally I did not use the output concatenation function from humann simply because I didn’t know if the original files would be kept. I simply added cat *paired_1.fastq *paired_2.fastq > cat-paired.fastq at the end of my script. But from what I see in your file list both files (concatenad and not concatenated) are being kept.

In any case; if you provided a reference genome for decontamination, the files of interest are *_paired_?.fastq. If not, they will end in *repeats.removed.?.fastq, because that is how the files are named after running through the tandem repeat finder (it’s the last step before decontamination).

TL;DR : if you did provide a reference genome for decontamination, you want to merge the *paired_?.fastq files, which --cat-final-output does and seemingly calls the output AST2R1_kneaddata.fastq in your case.

You could simply run this command ls -lha /home/plankton/Metagenomics_AST_Thatha/CG_DN_935/AST2.1/* to see if the file size is double the paired files.

Hope this helps!
cheers

Thank you Jorondo1 for the detailed clarification.
Thanks

1 Like