Hi everyone,
in the MetaPhlAn4 paper (based on the vJun23 database version), the authors mention that “the current methods do not extensively incorporate viral or eukaryotic microbial sequences, due to their unique genomic architectures and quality control requirements relative to bacterial and archaeal genomes.”. From this, I iassume that in that version there were no marker genes for fungi.
I’ve been analyzing the composition of the new vJan25 database and noticed that it now includes marker genes for around 303 fungal species (from http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan25_CHOCOPhlAnSGB_202503_species.txt.bz2).
I went back to review the database construction procedure described in the MetaPhlAn 4 paper, and from what I understand, the pipeline relies on CheckM and Prokka, which are designed for prokaryotic genomes. This likely explains why previous versions probably contained no fungi.
Given this, I’m wondering:
-
How were the fungal genomes in the vJan25 database incorporated? Were they processed with the same pipeline (CheckM + Prokka + marker selection) or through a different approach?
-
And, more generally, do you plan to expand fungal coverage in future releases, or should users consider adding their own marker genes for missing species? I was thinking about doing it with GitHub - steineggerlab/ufcg: UFCG: Universal Fungal Core Genes , any experience using it?
I’d really appreciate any clarification on this, I would be very interested in analyzing fungi in my samples.
Thanks a lot for your time and for all the great work on MetaPhlAn!
Best,
Alberto