I am using the latest release of MetaPhlAn and its associated databases, as part of a Humann3 run with default options. In the output bugs list, I am finding that for many parent taxa, the relative abundances of the associated child taxa do not sum to 100. Is this expected behavior and if so, can someone explain it to me? Thanks!
Hi @Elan, could you show us an example profile with the problem?
Hi there,
Here’s the bugs list I’ve been looking at. Note that I changed the header of the file slightly so that it could be parsed with pandas. A few things about the file I’ve observed:
-Entries are missing altogether for k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae, k__Bacteria|p__Firmicutes|c__Clostridia, and k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales, even though they have child taxa present
-The relative abundances of the phyla sum to 100, but not for lower levels. However, I noticed that the relative abundance of a parent do still seem to match the sums of the relative abundances of its children. Not sure if that’s a meaningful clue. I’m new to this and can’t tell what’s a feature and what’s a bug (no pun intended).
Thanks for your help!
Best,
Elan
formatted_bugs_list_3.tsv (43.7 KB)
Hi @Elan
It seems to be a problem with some of the NCBI taxonomy we included in the latests releases (some taxonomic inconsistencies that affected the MetaPhlAn taxonomic tree and those, the calculation of the relabs at higher taxonomic levels). I’m currently working in a fixed version of the database and a script to fix already generated profiles. I’ll keep you posted