MetaPhlAn 4.2.2 release (initial long-read sequencing support and database update)

Announcement

We are pleased to share that MetaPhlAn 4.2.2 is now available, which incorporates taxonomic profiling of long-read metagenomes for the first time and includes a new version of the MetaPhlAn database (vJan25_202503) containing >21k new SGBs.

For details on MetaPhlan 4, check announcing MetaPhlAn 4 or visit the MetaPhlAn 4 GitHub repository. For the full list of dropped features, modifications, and new features implemented in this new release check the MetaPhlAn 4.2.2 release notes.

What is new in MetaPhlAn 4.2.2?

  • Initial support for profiling long-read sequencing metagenomic datasets (with --long_reads). This relies on the newly implemented long-read mapper minimap2. It is also possible to split long reads (with --split_reads) in shorter sequences (defined by --split_readlen) and map them with bowtie2. Check the full release notes for all long-reads options.
  • Having now minimap2 as another mapper implemented in MetaPhlAn (in addition to bowtie2), parameters that referred to bowtie2 have now more generic names ( --bowtie2db is now --db_dir and --bowtie2out is now --mapout).
  • The parameter --unclassified_estimation is now default. This enables estimation of the portion of unclassified reads, making the relative abundance of detected taxa rescaled accordingly. To restore the previous behavior use --skip_unclassified_estimation, which will turn off including an estimate of unclassified reads in the relative abundance profile.

What has changed in vJan25_202503 in comparison to vJun23_202403?

Expansion and improvement of the genomic database:

  • ~63k new reference genomes from NCBI
  • ~416k new MAGs, spanning sea, food, and several animals, among other environments
  • Improved clustering of SGBs using skANI

Expansion and improvement of the markers database:

  • vJan25 includes 58,331 SGBs, 21,509 more SGBs than in vJun23
  • The mapping between the taxonomy of SGBs and the GTDB taxonomy has been updated from GTDB r207 to GTDB r220
  • Addition of taxonomy assignment for Viral Sequence Clusters (VSCs) - performed with geNomad

How to make use of the MetaPhlAn 4.2.2 updates?

  • How to install MetaPhlAn 4.2.2 in a new environment:
  • How to update the MetaPhlAn database from the vJun23_202403 (or earlier) version:
    • $ metaphlan --install --force_download
1 Like

Dear Biobakery Team,

Thanks for this release.

A small issue for us, and some precisions that might help new comers: it seems the official MetaPhlAn github README is outdated, the correct flag for database install in a custom folder is no more --bowtie2db as stated but --db_dir

When we launched the install doing this:

mkdir /resource/metaphlan422
metaphlan --install --db_dir /resource/metaphlan422

We had this error which has also been reported elsewhere:

Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan25_CHOCOPhlAnSGB_202503.nwk
Wed Jun 11 06:43:56 2025: [Error] EnvironmentError "[Errno 21] Is a directory: '/resource/metaphlan422'"
 Unable to download http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan25_CHOCOPhlAnSGB_202503.nwk

This is a bug, as obviously the --db_dir parameter is designed to expect a folder, and all the rest of the install work (and the folder was initially empty when download started, so it’s not a failed download in disguise either). However, the bug is really minor as it can easily being solved by downloading the missing file:

cd /resource/metaphlan422
wget http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan25_CHOCOPhlAnSGB_202503.nwk

After that, all seems fine and metaphlan can be run as stated with:

metaphlan \
  --input_type fastq input.fq.gz \
  --db_dir /resource/metaphlan422 \
  -o profile.txt
3 Likes