Announcing MetaPhlAn 4.1.1 release

Announcement: We are pleased to share that MetaPhlAn 4.1.1 is now available, which provides fixed and consistent taxonomic identification at high taxonomic ranks, allowing accurate relative abundance estimation also at high taxonomic levels. This is enabled by a new version of the MetaPhlAn database (vJun23_202403) containing the taxonomic updates. It is also possible to fix taxonomic profiles generated with previous versions of the MetaPhlAn database (i.e., vJun23_202307 and earlier) by running the new script fix_relab_mpa4.py.

For details on MetaPhlan 4, check announcing MetaPhlAn 4 or visit the MetaPhlAn 4 GitHub repository.

What is new in MetaPhlAn 4.1.1

  • New utility script (fix_relab_mpa4.py) to fix the profiles generated with previous versions of the MetaPhlAn database

  • Bug fix allowing MetaPhlAn to continue its execution even when the option --profile_vsc is set and no viral hits are found

  • Bug fix in the new implementation (since StrainPhlAn 4.1) of --print_clades_only

  • Implementation of the option --subsampling_paired [N_PAIRED_READS] to subsample paired-end input reads. This option allows to pass forward and reverse reads separately through arguments -1 and -2, enabling the use of paired-end information for the subsampling procedure (check the usage example). The previously described --subsampling [N_READS] is the choice for running single-end subsampling.
    Disclaimer: if paired-end information is not used (i.e., by using --subsampling), all reads in the input data are considered independent. This comes with the caveat that - after subsampling paired-end datasets with deep and shallow samples -, the deep samples will span a higher diversity of reads (because the different ends in the paired-end data will be rarely selected, while this will be more often the case for shallow samples). As a consequence, it is only by using paired-end information (through --subsampling_paired) that MetaPhlAn will be able to effectively correct for varying sequencing depths in paired-end data.

  • Improved taxonomies for the previous two MetaPhlAn databases (now vJun23_202403 and vOct22_202403, see below)

What has changed in vOct22_202403 in comparison to vOct22_202212

The vOct22_202403 database spans the exact same set of SGBs and marker genes present in the previously announced vOct22_202212 database. However, the new database contains fixed and consistent NCBI-based taxonomic labels. Further, the taxonomy for 2,087 SGBs in vOct22_202212 was reassigned following the identification of a bug in the calculation of centroid-centroid distances in the aforementioned database (this did not affect vJun23 databases).

What has changed in vJun23_202403 in comparison to vJun23_202307

The vJun23_202403 database spans the exact same set of SGBs and marker genes present in the previously announced vJun23_202307 database. However, the new database contains fixed and consistent NCBI-based taxonomic labels.

How to make use of the MetaPhlAn 4.1.1 updates

Thank you for this update!

If I ran HUMAnN using the output of MetaPhlAn with the vJun23_202307 database, will I need to rerun HUMAnN after running fix_relab_mpa4.py to fix the taxonomy? Or, are the levels used by HUMAnN unaffected? Thank you!

1 Like

Hi,

Thank you for the new support in profiling viruses. However, I have noticed an issue where MetaPhlAn fails to display error messages as expected and terminates improperly when the VSG FASTA file is missing, inaccessible, or compressed in the .bz2 format.

To address this bug, I have submitted a pull request on the MetaPhlAn GitHub repository. In my proposed solution, I have ensured that the parentheses of the .format method close correctly. And I have also included support for reading .bz2-compressed VSG FASTA files as input. I believe these changes should resolve the problem. Thanks.