KneadData — Need clarification on the exact input file names used at each internal step

gy.park · October 29, 2025, 7:33am

Hello, I would like clarification about the exact input FASTQ files that KneadData passes into each internal step.

I know the overall workflow is something like:

However, I’m trying to understand which exact intermediate files (with filenames) are used as the input to each stage.

tkuntz-hsph · November 7, 2025, 3:38pm

Each step creates a file (sometimes temporary) that is used as input for the next. All files are the input basename + “_kneaddata_”:

Decompression: Temporary files with the prefix “decompressed_”
Reformat identifiers: Temporary files with the prefix “reformatted_identifiers”
Trim: Files with the suffix “.trimmed.fastq”. For paired end files, “pair1” is used for files which have mates and “orphan1” is used for those which do not after trimming
TRF: Files with the suffix “.repeats.removed.fastq”
Decontam: Contaminant read files with the database name e.g. “demo_db_bowtie2_contam” and cleaned output just with “_kneaddata.fastq”. For paired end input, “paired_1.fastq” for paired files and “unmatched_1.fastq” for orphan files from the previous step.

Topic		Replies	Views
ERROR kneaddata output files KneadData	0	178	February 12, 2024
Kneaddata outputs KneadData	1	1167	July 9, 2021
No reads in the "Final output files created" KneadData	2	994	April 14, 2022
How to understand the output file? KneadData	5	2870	April 16, 2020
Long ugly file names of output files by Kneaddata KneadData	7	752	April 1, 2023