Only e.coli detected

Robin_Mjelle · June 27, 2020, 4:56pm

Hi,
I am running a WGS dataset through humann3 and in 5 of the 5 first samples, I only detect E.coli. I don’t expect a lot of bacteria in these samples since it is blood samples, but I wonder why E.coli turns up in most samples? Something wrong with my annotations? I updated databases based following the tutorial.

Gene Family SRR3163377_Abundance-RPKs

UNMAPPED 192889548.0000000000
UniRef90_A0A329XME7 5588517.9572251709
UniRef90_A0A329XME7|g__Escherichia.s__Escherichia_coli 5588517.9572251709
UniRef90_UPI000BB85CB4 2435365.2982355496
UniRef90_UPI000BB85CB4|g__Escherichia.s__Escherichia_coli 2435365.2982355496
UniRef90_A0A1X3K551 2374320.5282771089
UniRef90_A0A1X3K551|g__Escherichia.s__Escherichia_coli 2374320.5282771089
UniRef90_A0A2I2ZPI1 1890437.5465917028
UniRef90_A0A2I2ZPI1|g__Escherichia.s__Escherichia_coli 1890437.5465917028
UniRef90_A0A2B3TG24 1570004.5673570558
UniRef90_A0A2B3TG24|g__Escherichia.s__Escherichia_coli 1570004.5673570558
UniRef90_A0A1X3K7C5 1564615.7956286967
UniRef90_A0A1X3K7C5|g__Escherichia.s__Escherichia_coli 1564615.7956286967
UniRef90_UPI000928192A 1214206.6654682336

fbeghini · June 29, 2020, 7:08am

All of these proteins are annotated as “Uncharacterized”, some with a high similarity with macaque/human proteins. It is plausible that are coming from an assembly slightly contaminated by host sequences.

franzosa · June 29, 2020, 1:19pm

I like @fbeghini’s suggestion a lot. Have you already run host-read depletion on these samples? Another possibility with E. coli is reagent contamination, but based on the nature of the proteins hit I think the previous suggestion is more likely.

Robin_Mjelle · June 29, 2020, 1:37pm

I am running the raw reads, without any depletion. Is depletion recommended?

Best,

franzosa · June 29, 2020, 1:56pm

Highly recommended - for host-associated metagenomes there is almost always host DNA present in the sample, and it can range from low (in environments like the gut) to extremely high (in environments like the skin) depending on the microbial biomass present. I would guess that blood would be in the latter group.

We offer a pipeline for general QC and depletion here:

https://huttenhower.sph.harvard.edu/kneaddata/

In addition, any post-QC measurements that correlate with the amount of host reads removed should be treated suspiciously (they may represent uncharacterized host elements).

Dario · April 30, 2024, 12:00am

There should not be any microbiome in blood. Please view No Evidence for a Common Blood Microbiome Based on a Population Study of 9770 Healthy Humans, Nature Microbiology. If you find microbes in blood, either they are false-positives or the person has septicemia, which is deadly. In my experience, Kraken will generate lots of false-positives, which will give clinically-implausible results, but MetaPhlAn has good specificity.

Topic		Replies	Views
Low number of EC IDs mapped from gene families in HUMANn3 HUMAnN	4	768	October 5, 2020
No UniRef90 IDs from Humann3 have information in UniProfKB site? HUMAnN	2	511	September 18, 2020
High proportion of Unmapped Uniref90 reads (and very few KOs after regroup) HUMAnN	1	577	August 3, 2020
No pathways detected with --input-format genetable HUMAnN	2	27	January 8, 2025
High Percentage of Unmapped and Unintegrated Reads in HUMAnN3 Analysis HUMAnN	2	101	October 30, 2024

Only e.coli detected

Gene Family SRR3163377_Abundance-RPKs

Related topics