I have finished running MetaPhlAn4 on my data and started analyzing the results.
I noticed two main issues which I believe require your attention:
For some of the samples the relative abundance at the species level sums up to 100% yet it is lower at the genus level. From what I understand, at least in my samples, the problem stems from the following clades:
k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Eubacteriaceae|g__Eubacterium|s__Eubacterium_ventriosum
k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Eubacteriaceae|g__Eubacterium|s__Eubacterium_sp_AF34_35BH
k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Eubacteriaceae|g__Eubacterium|s__Eubacterium_sp_AF22_9
k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Eubacteriaceae|g__Eubacterium|s__Eubacterium_SGB6276
k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Eubacteriaceae|g__Eubacterium|s__Eubacterium_SGB4329
k__Bacteria|p__Firmicutes|c__CFGB9301|o__OFGB9301|f__FGB9301|g__GGB53985|s__GGB53985_SGB6367
k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Peptostreptococcaceae|g__Romboutsia|s__Romboutsia_timonensis
k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Peptostreptococcaceae|g__Romboutsia|s__Romboutsia_hominis
k__Bacteria|p__Proteobacteria|c__CFGB3069|o__OFGB3069|f__FGB3069|g__GGB9770|s__GGB9770_SGB57575
k__Bacteria|p__Actinobacteria|c__Actinomycetia|o__Micrococcales|f__Micrococcaceae|g__Arthrobacter|s__Arthrobacter_sp_HMSC06H05
k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Eubacteriaceae|g__Eubacterium|s__Eubacterium_ramulus
k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Eubacteriaceae|g__Eubacterium|s__Eubacterium_sp_AF22_8LB
k__Bacteria|p__Firmicutes|c__Erysipelotrichia|o__Erysipelotrichales|f__Erysipelotrichaceae|g__Catenibacterium|s__Candidatus_Catenibacterium_tridentinum
k__Bacteria|p__Firmicutes|c__CFGB1354|o__OFGB1354|f__FGB1354|g__GGB3304|s__GGB3304_SGB4367
k__Bacteria|p__Firmicutes|c__CFGB75916|o__OFGB75916|f__FGB75916|g__GGB2993|s__GGB2993_SGB3978
k__Bacteria|p__Proteobacteria|c__Alphaproteobacteria|o__Hyphomicrobiales|f__Bradyrhizobiaceae|g__Bradyrhizobium|s__Bradyrhizobium_viridifuturi
Is it possible that there are some errors in the clades annotation? For example, under the genus “g__GGB79996” there are the following organisms:
k__Bacteria|p__Actinobacteria|c__CFGB10299|o__OFGB10299|f__FGB10299|g__GGB79996|s__GGB79996_SGB14375 k__Bacteria|p__Actinobacteria|c__CFGB10299|o__OFGB10299|f__FGB10299|g__GGB79996|s__GGB79996_SGB14375|t__SGB14375
k__Bacteria|p__Firmicutes|c__CFGB10299|o__OFGB10299|f__FGB10299|g__GGB79996
A similar issue appears with the genus “g__GGB1249” which appears in:
k__Bacteria|p__Bacteroidetes|c__CFGB76191|o__OFGB76191|f__FGB76191|g__GGB1249|s__GGB1249_SGB1670
k__Bacteria|p__Bacteroidetes|c__CFGB76191|o__OFGB76191|f__FGB76191|g__GGB1249|s__GGB1249_SGB1670|t__SGB1670
k__Bacteria|p__Firmicutes|c__CFGB76191|o__OFGB76191|f__FGB76191|g__GGB1249
It appears that these all share a common genus although they are annotated as belonging to a different phylum (otherwise, the class, order, and family all match). Could this be an error in the database annotation?
thanks for reporting the bug. We are aware of it, it is indeed an inconsistency in taxonomic assignments with regards to uFGBs, i.e. that different SGBs could have the same GGB id, but different phylum assigned. This is a problem until the Oct22 release and it will be fixed in the following ones to be released.
I am running the latest versions of MetaPhlAn and the associated databases, and am finding that in many cases the relative abundances still do not sum to 100
Using Metaphlan 4.1 with markerdb mpa_vJun23_CHOCOPhlAnSGB_202307.
I get sum 100% at phylum level, but only 48% on class and order, and 88% on family level. Most of phylum Firmicutes seems to disappear. drops from 61% (phyla) down to 11% (class).
Large class level p__Firmicutes;c__Clostridia is fully missing (although there is a p__Bacillota;c__Clostridia with 0.07%). This compared to family level where there are entries such as k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae with 20%.
My results are consistent with Kristian’s. k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae, k__Bacteria|p__Firmicutes|c__Clostridia, and k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales are all completely absent, even though finer grained taxa that belong to those groups are present