Hi,
I am running a WGS dataset through humann3 and in 5 of the 5 first samples, I only detect E.coli. I don’t expect a lot of bacteria in these samples since it is blood samples, but I wonder why E.coli turns up in most samples? Something wrong with my annotations? I updated databases based following the tutorial.
All of these proteins are annotated as “Uncharacterized”, some with a high similarity with macaque/human proteins. It is plausible that are coming from an assembly slightly contaminated by host sequences.
I like @fbeghini’s suggestion a lot. Have you already run host-read depletion on these samples? Another possibility with E. coli is reagent contamination, but based on the nature of the proteins hit I think the previous suggestion is more likely.
Highly recommended - for host-associated metagenomes there is almost always host DNA present in the sample, and it can range from low (in environments like the gut) to extremely high (in environments like the skin) depending on the microbial biomass present. I would guess that blood would be in the latter group.
We offer a pipeline for general QC and depletion here:
In addition, any post-QC measurements that correlate with the amount of host reads removed should be treated suspiciously (they may represent uncharacterized host elements).
There should not be any microbiome in blood. Please view No Evidence for a Common Blood Microbiome Based on a Population Study of 9770 Healthy Humans, Nature Microbiology. If you find microbes in blood, either they are false-positives or the person has septicemia, which is deadly. In my experience, Kraken will generate lots of false-positives, which will give clinically-implausible results, but MetaPhlAn has good specificity.