The bioBakery help forum

Kneaddata outputs

Hi
I try to understand the output files generated in kneaddata for a paired-end sample

Files:
Sample_1 (Forward) (11G)
Sample_2 (Reverse) (9.7G)

I obtain:
Sample_kneaddata.log
Sample_1.kneaddata.repeats.removed.1.fastq (8.6G)
Sample_2.kneaddata.repeats.removed.2.fastq (7.2G)
Sample_1.kneaddata.repeats.removed.unmatched.1.fastq (1.2G)
Sample_2.kneaddata.repeats.removed.unmatched.2.fastq (85.9MB)
Sample_1.kneaddata.trimmed.1.fastq (8.7G)
Sample_2.kneaddata.trimmed.2.fastq (7.3G)
Sample_1.kneaddata.trimmed.single.1.fastq (1.2G)
Sample_2.kneaddata.trimmed.single.2.fastq (86.3 MB)

I suppose that Sample_1.kneaddata.repeats.removed.1.fastq (and _2) are the files that I need to continue to the next step in the analysis, due to the file size (smaller than trimmed files). It is right?
Also, I don’t understand what is single in “Sample_X.kneaddata.trimmed.single.X.fastq” and why I obtain a so important difference between file size _1 (1.2G) and _2 (86.3MB)

I appreciate your help
All the best,
Joao

Hi @Joao_Gatica,

Yes, you are correct that Sample_1.kneaddata.repeats.removed.1.fastq (and _2) are the files that
you need to continue to the next step in the analysis. Since our workflow run Trimmomatic → TRF → Bowtie2 in this order, for the latest version of Kneaddata, Sample_1.kneaddata.repeats.removed.1.fastq (and _2) are the results of the TRF step.

“Sample_X.kneaddata.trimmed.single.X.fastq” are the sequences that were trimmed from the samples in the Trimmomatic step. Please see the default Trimmomatic setting that we are currently using for the Kneaddata here (kneaddata · biobakery/biobakery Wiki · GitHub).

[ DEFAULT : ILLUMINACLIP:/TruSeq3-SE.fa:2:30:10 SLIDINGWINDOW:4:20 MINLEN:50 ]

We use adapter trimming, sliding window and minimum bp length value. I assume that there is a bit of a difference in the _R1 vs _R2 read length which is causing this inconsistency? file size _1 (1.2G) and _2 (86.3MB)

Regards,
Sagun