Hi Marie - You can read a bit more about KneadData here:
(Note that we’re in the process of migrating all of this material to Github; KneadData hasn’t moved yet.) The QC in KneadData is currently a two-step process: 1) general read-level QC and 2) contaminant read depletion.
Step 1 uses Trimmomatic and follows general best practices for shotgun sequencing reads: i.e. trim away low-quality bases and then discard the read if there aren’t enough high-quality bases remaining.
Step 2 maps the high-quality reads against one or more databases of contaminant sequences. In the case of human-associated metagenomes, we use a modified version of the human genome as the database for this step (modified = containing additional “decoy” sequences that are also believed to represent human contamination).