KneadData — Need clarification on the exact input file names used at each internal step

Hello, I would like clarification about the exact input FASTQ files that KneadData passes into each internal step.

I know the overall workflow is something like:

  1. (if input is .gz) decompress

  2. reformat sequence identifiers

  3. trimming (Trimmomatic)

  4. decontamination (Bowtie2)

  5. (optional) repeated remove – second decontamination step

However, I’m trying to understand which exact intermediate files (with filenames) are used as the input to each stage.

Each step creates a file (sometimes temporary) that is used as input for the next. All files are the input basename + “_kneaddata_”:

  1. Decompression: Temporary files with the prefix “decompressed_”
  2. Reformat identifiers: Temporary files with the prefix “reformatted_identifiers”
  3. Trim: Files with the suffix “.trimmed.fastq”. For paired end files, “pair1” is used for files which have mates and “orphan1” is used for those which do not after trimming
  4. TRF: Files with the suffix “.repeats.removed.fastq”
  5. Decontam: Contaminant read files with the database name e.g. “demo_db_bowtie2_contam” and cleaned output just with “_kneaddata.fastq”. For paired end input, “paired_1.fastq” for paired files and “unmatched_1.fastq” for orphan files from the previous step.