The bioBakery help forum

Kneaddata decontamination algorithm

My team has started doing some microbiome analysis recently, and came across several useful tools developed by your lab, including Kneaddata.

We are particularly curious about the decontamination step, and wonder if you could share more details with us about how exactly it is done?

Many thanks in advance for your kind advice!


Hi Marie - You can read a bit more about KneadData here:

(Note that we’re in the process of migrating all of this material to Github; KneadData hasn’t moved yet.) The QC in KneadData is currently a two-step process: 1) general read-level QC and 2) contaminant read depletion.

Step 1 uses Trimmomatic and follows general best practices for shotgun sequencing reads: i.e. trim away low-quality bases and then discard the read if there aren’t enough high-quality bases remaining.

Step 2 maps the high-quality reads against one or more databases of contaminant sequences. In the case of human-associated metagenomes, we use a modified version of the human genome as the database for this step (modified = containing additional “decoy” sequences that are also believed to represent human contamination).