Metaphlan genus level relative abundance not summing up to 100% and possible database problem

Hi there,

I have finished running MetaPhlAn4 on my data and started analyzing the results.
I noticed two main issues which I believe require your attention:

  1. For some of the samples the relative abundance at the species level sums up to 100% yet it is lower at the genus level. From what I understand, at least in my samples, the problem stems from the following clades:
    k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Eubacteriaceae|g__Eubacterium|s__Eubacterium_ventriosum
    k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Eubacteriaceae|g__Eubacterium|s__Eubacterium_sp_AF34_35BH
    k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Eubacteriaceae|g__Eubacterium|s__Eubacterium_sp_AF22_9
    k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Eubacteriaceae|g__Eubacterium|s__Eubacterium_SGB6276
    k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Eubacteriaceae|g__Eubacterium|s__Eubacterium_SGB4329
    k__Bacteria|p__Firmicutes|c__CFGB9301|o__OFGB9301|f__FGB9301|g__GGB53985|s__GGB53985_SGB6367
    k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Peptostreptococcaceae|g__Romboutsia|s__Romboutsia_timonensis
    k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Peptostreptococcaceae|g__Romboutsia|s__Romboutsia_hominis
    k__Bacteria|p__Proteobacteria|c__CFGB3069|o__OFGB3069|f__FGB3069|g__GGB9770|s__GGB9770_SGB57575
    k__Bacteria|p__Actinobacteria|c__Actinomycetia|o__Micrococcales|f__Micrococcaceae|g__Arthrobacter|s__Arthrobacter_sp_HMSC06H05
    k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Eubacteriaceae|g__Eubacterium|s__Eubacterium_ramulus
    k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Eubacteriaceae|g__Eubacterium|s__Eubacterium_sp_AF22_8LB
    k__Bacteria|p__Firmicutes|c__Erysipelotrichia|o__Erysipelotrichales|f__Erysipelotrichaceae|g__Catenibacterium|s__Candidatus_Catenibacterium_tridentinum
    k__Bacteria|p__Firmicutes|c__CFGB1354|o__OFGB1354|f__FGB1354|g__GGB3304|s__GGB3304_SGB4367
    k__Bacteria|p__Firmicutes|c__CFGB75916|o__OFGB75916|f__FGB75916|g__GGB2993|s__GGB2993_SGB3978
    k__Bacteria|p__Proteobacteria|c__Alphaproteobacteria|o__Hyphomicrobiales|f__Bradyrhizobiaceae|g__Bradyrhizobium|s__Bradyrhizobium_viridifuturi

  2. Is it possible that there are some errors in the clades annotation? For example, under the genus “g__GGB79996” there are the following organisms:
    k__Bacteria|p__Actinobacteria|c__CFGB10299|o__OFGB10299|f__FGB10299|g__GGB79996|s__GGB79996_SGB14375 k__Bacteria|p__Actinobacteria|c__CFGB10299|o__OFGB10299|f__FGB10299|g__GGB79996|s__GGB79996_SGB14375|t__SGB14375
    k__Bacteria|p__Firmicutes|c__CFGB10299|o__OFGB10299|f__FGB10299|g__GGB79996

A similar issue appears with the genus “g__GGB1249” which appears in:
k__Bacteria|p__Bacteroidetes|c__CFGB76191|o__OFGB76191|f__FGB76191|g__GGB1249|s__GGB1249_SGB1670
k__Bacteria|p__Bacteroidetes|c__CFGB76191|o__OFGB76191|f__FGB76191|g__GGB1249|s__GGB1249_SGB1670|t__SGB1670
k__Bacteria|p__Firmicutes|c__CFGB76191|o__OFGB76191|f__FGB76191|g__GGB1249

It appears that these all share a common genus although they are annotated as belonging to a different phylum (otherwise, the class, order, and family all match). Could this be an error in the database annotation?

Thanks in advance,
Nadav

Hi @Nadav_Moriel

thanks for reporting the bug. We are aware of it, it is indeed an inconsistency in taxonomic assignments with regards to uFGBs, i.e. that different SGBs could have the same GGB id, but different phylum assigned. This is a problem until the Oct22 release and it will be fixed in the following ones to be released.

Best

Michal

I am running the latest versions of MetaPhlAn and the associated databases, and am finding that in many cases the relative abundances still do not sum to 100

Using Metaphlan 4.1 with markerdb mpa_vJun23_CHOCOPhlAnSGB_202307.

I get sum 100% at phylum level, but only 48% on class and order, and 88% on family level. Most of phylum Firmicutes seems to disappear. drops from 61% (phyla) down to 11% (class).

Large class level p__Firmicutes;c__Clostridia is fully missing (although there is a p__Bacillota;c__Clostridia with 0.07%). This compared to family level where there are entries such as k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae with 20%.

PHYLA:
k__Bacteria|p__Firmicutes|2|1239|61.08822|
k__Bacteria|p__Bacteroidota|2|976|24.78932|
k__Bacteria|p__Proteobacteria|2|1224|8.54597|
k__Bacteria|p__Actinobacteria|2|201174|5.4726|
k__Bacteria|p__Bacillota|2|1239|0.07432|
k__Bacteria|p__Bacteria_unclassified|2||0.02958|

CLASS:
k__Bacteria|p__Bacteroidota|c__Bacteroidia|2|976|200643|24.78932|
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|2|1224|1236|8.05845|
k__Bacteria|p__Actinobacteria|c__Actinomycetia|2|201174|1760|4.69221|
k__Bacteria|p__Firmicutes|c__Firmicutes_unclassified|2|1239||2.58896|
k__Bacteria|p__Firmicutes|c__Bacilli|2|1239|91061|1.90796|
k__Bacteria|p__Firmicutes|c__Negativicutes|2|1239|909932|1.65363|
k__Bacteria|p__Firmicutes|c__CFGB3054|2|1239||0.93087|
k__Bacteria|p__Actinobacteria|c__Coriobacteriia|2|201174|84998|0.78039|
k__Bacteria|p__Firmicutes|c__CFGB3057|2|1239||0.59732|
k__Bacteria|p__Firmicutes|c__CFGB1798|2|1239||0.56436|
k__Bacteria|p__Proteobacteria|c__Betaproteobacteria|2|1224|28216|0.48751|
k__Bacteria|p__Firmicutes|c__CFGB38642|2|1239||0.46762|
k__Bacteria|p__Firmicutes|c__CFGB3070|2|1239||0.3679|
k__Bacteria|p__Firmicutes|c__CFGB75721|2|1239||0.21093|
k__Bacteria|p__Firmicutes|c__CFGB3048|2|1239||0.18617|
k__Bacteria|p__Firmicutes|c__CFGB1227|2|1239||0.11286|
k__Bacteria|p__Bacillota|c__Clostridia|2|1239|186801|0.07432|
k__Bacteria|p__Firmicutes|c__CFGB1292|2|1239||0.06017|
k__Bacteria|p__Firmicutes|c__CFGB3038|2|1239||0.03362|
k__Bacteria|p__Bacteria_unclassified|c__Bacteria_unclassified|2|||0.02958|
k__Bacteria|p__Firmicutes|c__CFGB79294|2|1239||0.01827|
k__Bacteria|p__Firmicutes|c__CFGB1011|2|1239||0.01466|
k__Bacteria|p__Firmicutes|c__CFGB77202|2|1239||0.01421|
k__Bacteria|p__Firmicutes|c__CFGB79245|2|1239||0.00377|
k__Bacteria|p__Firmicutes|c__CFGB1249|2|1239||0.00201|
k__Bacteria|p__Firmicutes|c__CFGB3009|2|1239||0.0006|

best,
Kristian

1 Like

My results are consistent with Kristian’s. k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae, k__Bacteria|p__Firmicutes|c__Clostridia, and k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales are all completely absent, even though finer grained taxa that belong to those groups are present