Announcement: We are pleased to share that MetaPhlAn 4 is now published open-access in Nature Biotechnology (Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4 | Nature Biotechnology). We are also releasing an updated database version (vOct22) with 3,580 more SGBs compared to the current and published version (vJan21).
MetaPhlAn 4 in a nutshell
The new MetaPhlAn 4 integrates information from both metagenome assemblies and microbial isolate genomes for improved and more comprehensive metagenomic taxonomic profiling (Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4 | Nature Biotechnology). From a curated collection of 1.01M prokaryotic reference and metagenome-assembled genomes, we define ~5M unique marker genes for 26,970 species-level genome bins (SGBs), 4,992 of them taxonomically unidentified at the species level (uSGBs). MetaPhlAn 4, when compared to version 3, explains ~20% more reads in most international human gut microbiomes and >40% in less-characterized environments such as the rumen microbiome. MetaPhlAn 4 also proves to be more accurate than available alternatives on synthetic evaluations while also reliably quantifying organisms with no cultured isolates, a major improvement for the field. Application of the method to >24,500 metagenomes highlights previously undetected species that are strong biomarkers for host phenotypes in human and mouse microbiomes and, thanks to StrainPhlAn 4, shows that even previously uncharacterized species can be genetically profiled at the resolution of single microbial strains. MetaPhlAn 4 integrates the novelty of metagenomic assemblies with the sensitivity and fidelity of reference-based analyses, providing efficient metagenomic profiling of uncharacterized species and enabling deeper and more comprehensive microbiome analyses.
For more detailed information about MetaPhlAn 4, please visit Announcing MetaPhlAn 4.
An updated markers database is now available!
Further, we have already updated the currently published MetaPhlAn 4 markers database (named vJan21) with the addition of more than 200k new genomes! The new vOct22 database is now able to profile 3,580 more SGBs than the vJan21, keeping a similar performance on synthetic evaluations in comparison to the previous version.
What has changed in vOct22 in comparison to vJan21:
Genomic database:
- Inclusion of new MAGs
- ~10k MAGs from food-related sources
- ~35k MAGs from the oral cavity
- ~50k MAGs from the mice gut
- ~30k MAGs environmental MAGs
- ~1k archaeal MAGs
- ~3k MAGs from long-read sequencing
- additional public MAGs from diverse sources (ocean, animals, skin, etc)
- 2,548 genomes considered reference genomes in vJan21 were relabelled as MAGs in NCBI
- 1,550 kSGBs in vJan21 are now uSGBs in vOct22 (515 with less than 5 MAGs in vOct22 and thus discarded)
Removed redundant reference genomes from the vJan21 genomic database using a MASH distance threshold at 0.1%
- 1,550 kSGBs in vJan21 are now uSGBs in vOct22 (515 with less than 5 MAGs in vOct22 and thus discarded)
- Local reclustering to improve SGB definitions of oversized or too-close SGBs
- Improved GGB and FGB definitions by reclustering SGB centroids from scratch
- Improved phylum assignment of SGBs with no reference genomes at FGB level using MASH distances on amino acids to find the closest kSGB
Markers database
- The new vOct22 now includes 30,550 SGBs
- 3,580 more SGBs than in vJan21
- The new vOct22 has drastically reduce the number of SGB groups, i.e. groups of really close SGBs profiled together as an unit
- Only 90 SGB groups in comparison to the 237 in vJan21
- We introduced a new procedure to trim markers and only keep fragments that are unique and long enough
- When mapped to check for uniqueness, the core genes are first split into “windows” of length 150 nt. Using this mapping, we now discard windows that are not unique and reduce/split markers into sufficiently long (>450 nt) chains of unique windows
What is missing:
- The GTDB taxonomic assignment for the vOct22 database is not available yet (expected release: end of Feb 2023)
- The phylogenetic tree of life for the vOct22 database is not available yet (expected release: TBD).
How does the new vOct22 profiling compare to vJan21?
Evaluation of the MetaPhlAn 4 vOct22 database using the CAMI II synthetic metagenomes. We evaluated the latest vOct22 database using the 128 synthetic metagenomes representing host-associated communities from the CAMI II taxonomic profiling challenge. We compared the performance of the vOct22 with the previous vJan21 database as well as the available alternatives assessed in the recently published MetaPhlAn 4 work (Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4 | Nature Biotechnology). GIT = Gastrointestinal tract, UT = Urogenital tract.
Evaluation of the MetaPhlAn 4 vOct22 in comparison with the previous vJan21 database using real metagenomes from multiple sources. We evaluated the SGB richness reported by MetaPhlAn 4 using the original vJan21 and the latest vOct22 databases on real metagenomic samples from diverse environments. The number of kSGBs per sample tended to decrease due to 1,550 kSGBs in vJan21 that were relabelled as uSGBs in vOct22. NHP = Non-human primates, W = Westernized, NW = non-Westernized, F. food = Fermented food.
How to perform a fresh install of MetaPhlAn 4
- See here for instructions on installing MetaPhlAn 4 in a new environment:
- How to upgrade the database from the previous vJan21 version:
$ metaphlan --install --force_download