Kneaddata output

I am analyzing my shot-gun metagenome sample in kneaddata. I gave the command “kneaddata --input AST2R1.fastq --input AST2R1.fastq -db $/home/plankton/Kneadata_DIR --output /home/plankton/Metagenomics_AST_Thatha/CG_DN_935 --trimmomatic /home/plankton/anaconda3/share/trimmomatic-0.39-2 --cat-final-output” and got output as

Final output files created:

So which file I have to use for the next step(Humann3)

Thanks for your valuable time

Hi @balamurugan_Sadaiapp ,

Thank you for reaching out to the biobakery Lab.

The merged file of the following two output files would be the input for the Humann3 step.



HI Sagun
Based on the Kneaddata outputs discussion, I merged the
AST6_R1_kneaddata.repeats.removed.1.fastq and AST6_R1_kneaddata.repeats.removed.1.fastq and used for humann3.
Please clarify, this is correct.

Thank you for your reply

Hi @balamurugan_Sadaiapp
Personally I did not use the output concatenation function from humann simply because I didn’t know if the original files would be kept. I simply added cat *paired_1.fastq *paired_2.fastq > cat-paired.fastq at the end of my script. But from what I see in your file list both files (concatenad and not concatenated) are being kept.

In any case; if you provided a reference genome for decontamination, the files of interest are *_paired_?.fastq. If not, they will end in *repeats.removed.?.fastq, because that is how the files are named after running through the tandem repeat finder (it’s the last step before decontamination).

TL;DR : if you did provide a reference genome for decontamination, you want to merge the *paired_?.fastq files, which --cat-final-output does and seemingly calls the output AST2R1_kneaddata.fastq in your case.

You could simply run this command ls -lha /home/plankton/Metagenomics_AST_Thatha/CG_DN_935/AST2.1/* to see if the file size is double the paired files.

Hope this helps!

Thank you Jorondo1 for the detailed clarification.

1 Like