MetaPhlAn 4 + HUMAnN 4 compatibility

HUMAnN users - I’m making a quick pinned post about MetaPhlAn 4 + HUMAnN 4 compatibility since we’re seeing a lot of posts raising similar issues. We will follow up with more detailed information in the short term, and longer-term we’re also improving our version checking at install and runtime to avoid some of these issues going forward.

Key point 1: The current HUMAnN 4 release (v4.0.0.alpha.1) should work with MetaPhlAn 4 releases up to v4.1.1. It does not support v4.2+, which introduced an API change we still need to adapt to.

Key point 2: All versions of HUMAnN work with specific MetaPhlAn marker databases, since we require compatibility between MetaPhlAn’s markers + taxonomy and HUMAnN’s pangenomes + functional annotations. To use HUMAnN 4.0.0.alpha.1, you should be working with the mpa_vOct22_CHOCOPhlAnSGB_202403 MetaPhlAn marker database. If you install (or update to) a newer marker database it will break HUMAnN 4 compatibility.

If you’re working with older versions of HUMAnN (e.g. v3.9) in MetaPhlAn 4 compatibility mode, please see the release notes for your specific version of HUMAnN for the correct MetaPhlAn software and marker versions.

Apologies to those that have been struggling with HUMAnN installation as a result of the constraints above, and thanks for raising awareness here.

4 Likes

Hello,

I followed the instructions (HERE) to download and install Humann4 + Metaphlan4 database, and I am getting the following ERROR:

ERROR: The MetaPhlAn taxonomic profile provided does not contain the database version vOct22_CHOCOPhlAnSGB_202403 in any of its header lines.

I downloaded the Metaphlan database using: metaphlan --install --index mpa_vOct22_CHOCOPhlAnSGB_202403

Checking the download output, I noticed I got files from mpa_vOct22_CHOCOPhlAnSGB_202403 and mpa_vJan25_CHOCOPhlAnSGB_202503 versions. See them below:

mpa_latest; mpa_vJan25_CHOCOPhlAnSGB_202503.pkl mpa_vOct22_CHOCOPhlAnSGB_202403.2.bt2l mpa_vOct22_CHOCOPhlAnSGB_202403.rev.2.bt2l
mpa_vJan25_CHOCOPhlAnSGB_202503.1.bt2l mpa_vJan25_CHOCOPhlAnSGB_202503.rev.1.bt2l mpa_vOct22_CHOCOPhlAnSGB_202403.3.bt2l mpa_vOct22_CHOCOPhlAnSGB_202403_VINFO.csv
mpa_vJan25_CHOCOPhlAnSGB_202503.2.bt2l mpa_vJan25_CHOCOPhlAnSGB_202503.rev.2.bt2l mpa_vOct22_CHOCOPhlAnSGB_202403.4.bt2l mpa_vOct22_CHOCOPhlAnSGB_202403_VSG.fna
mpa_vJan25_CHOCOPhlAnSGB_202503.3.bt2l mpa_vJan25_CHOCOPhlAnSGB_202503_VINFO.csv mpa_vOct22_CHOCOPhlAnSGB_202403.pkl README.txt
mpa_vJan25_CHOCOPhlAnSGB_202503.4.bt2l mpa_vOct22_CHOCOPhlAnSGB_202403.1.bt2l mpa_vOct22_CHOCOPhlAnSGB_202403.rev.1.bt2l

Any advice?

Thanks

It’s because “metaphlan” is using the "latest” version of the database: I worked around the problem by changing the “latest” version file:

cd .conda/envs/humann/lib/python3.12/site-packages/metaphlan/metaphlan_databases
mv -i mpa_latest mpa_latest.dist
echo mpa_vOct22_CHOCOPhlAnSGB_202403 > mpa_latest

1 Like

Hello, fellow bioBakers! :waving_hand:

Apologies if this isn’t the right thread for my question — please let me know if I should post it separately.

I’ve recently started working on metatranscriptomics with a focus on the gut microbiota, and I’m currently running HUMAnN v4.0.0.alpha.1 with MetaPhlAn v4.1.1 (11 Mar 2024).

However, I’ve noticed something odd: in my utility_mapping subdirectory, I have the files mpa_vJan21_CHOCOPhlAnSGB_202103.tsv and vOct22_SGB_mapping.tsv, even though my MetaPhlAn database is mpa_vOct22_CHOCOPhlAnSGB_202403.

Did I miss a step during installation or database setup?

Should I manually download the corresponding mpa_vOct22_CHOCOPhlAnSGB_202403.tsv file from somewhere else?

Any guidance or clarification would be greatly appreciated!

Best,
Fran :man_technologist:

Same with a fresh installation here:
mpa_vJan21_CHOCOPhlAnSGB_202103.tsv and vOct22_SGB_mapping.tsv.

Is it really compatible with MetaPhlAn v4.1.1 + mpa_vOct22_CHOCOPhlAnSGB_202403?

When installing HUMAnN following the details here, MetaPhlAn v4.2.4 is installed.

humann --version
humann v4.0.0.alpha.1

metaphlan --version
MetaPhlAn version 4.2.4 (21 Oct 2025)

Some clarification would be super helpful !

1 Like

Hello Biobakers,

I am wondering if you have some updates on this?

Many thanks!

I struggled with similar issues described here when trying to get humann4 v4.0.0.alpha.1 running in my compute environment. I eventually got it working and these were the key things that helped me:

  1. setup conda channels
    conda config --add channels defaults
    conda config --add channels conda-forge
    conda config --add channels bioconda
    conda config --add channels biobakery

  2. specify versions when installing via conda
    conda create -n humann4 python=3.12
    conda activate humann4
    conda install humann=4.0.0a1
    conda install metaphlan=4.1.1

  3. Download the correct database versions to specific paths
    metaphlan --install --db_dir metaphlan_databases/vOct22 --index mpa_vOct22_CHOCOPhlAnSGB_202403
    humann_databases --download uniref uniref90_ec_filtered_diamond humann4_dbs/
    humann_databases --download chocophlan full humann4_dbs/
    humann_databases --download utility_mapping full humann4_dbs/

  4. Specify database paths for both humann and metaphlan in the command
    humann -r -i SAMPLE.fq.gz -o ./SAMPLE/ --threads 16 --protein-database humann4_dbs/uniref --nucleotide-database humann4_dbs/chocophlan --metaphlan-options "-t rel_ab_w_read_stats --bowtie2db metaphlan_databases/vOct22 --index mpa_vOct22_CHOCOPhlAnSGB_202403"

  5. For me it was critical to include -t rel_ab_w_read_stats in the --metaphlan-optionsstring, otherwise metaphlan reverted to the default -t value which causes humann4 to not recognize it as a valid taxonomic profile

Hopefully this will all be outdated when a non-alpha release of Humann4 drops soon!

3 Likes

Thanks a lot! This works for me as well!

Hi @klomp030! :waving_hand:

Did you manage to get the corresponding mpa_vOct22_CHOCOPhlAnSGB_202403.tsv in your utility_mapping subdirectory?

I would appreciate any guidance or clarification on this front.

Best,
Fran :man_technologist:

1 Like

Thanks for sharing your setup steps, @jtrachsel! :+1:

I wanted to add a note about step 3: the MetaPhlAn database installation. In my environment, the command you suggested:

metaphlan --install --db_dir metaphlan_databases/vOct22 --index mpa_vOct22_CHOCOPhlAnSGB_202403

didn’t work as expected. Instead, I had to use --bowtie2db instead of --db_dir:

metaphlan --install --bowtie2db data/databases/metaphlan/vOct22_CHOCOPhlAnSGB_202403 --index mpa_vOct22_CHOCOPhlAnSGB_202403

This might be version-specific behavior or environment-dependent, but I thought it worth mentioning for others who might encounter the same issue.

Also, the root issue I was originally referring to relates to the utility_mapping database: the file mpa_vOct22_CHOCOPhlAnSGB_202403.tsv appears to be missing from the full_mapping_v4_alpha.tar.gz archive. I’ve opened a separate thread to discuss this issue in detail, as it affects compatibility with MetaPhlAn v4.1.1 and the mpa_vOct22_CHOCOPhlAnSGB_202403 database.

Thanks again for documenting your working setup; it’s been very helpful!

Best,
Fran
:man_technologist:

2 Likes

Thanks for the detailed explanation. I followed all these steps (with exception of using –bowtie2db instead of –db_dir). However, I get the following error:

CRITICAL ERROR: The directory provided for ChocoPhlAn contains files ( chocophlan.v4_alpha.tar.gz ) that are not of the expected version. Please install the latest version of the database: SGB

Has it happened to anyone?

P.S., I had to download all files manually (instead of using –install and –download commands) as our server has blocked downloading from associated links, does it matter?

Hi @Faeze_Darbaniyan! :waving_hand:

Which version of the ChocoPhlAn database did you manually download?

Hi,

Thanks for your reply @fmerinocasallo! I use chocophlan.v4_alpha.tar.gz for ChocoPhlAn and mpa_vOct22_CHOCOPhlAnSGB_202403 for index. Am I using anything wrong?

Thanks!

Faezeh

I just received a very similar error message.

I solved the issue by moving the problematic directories to a different location.

I suggest you move the chocophlan.v4_alpha.tar.gz file outside of the ChocoPhlAn directory.

Thanks for your help @fmerinocasallo ! Would you please specify what I need to include in the ChocoPhlAn directory then? here is what I have included in each of the directories:

$ ls chocophlan/

chocophlan.v4_alpha.tar.gz

$ ls vOct22/

mpa_vOct22_CHOCOPhlAnSGB_202403_marker_info.txt.bz2 mpa_vOct22_CHOCOPhlAnSGB_202403_species.txt.bz2

mpa_vOct22_CHOCOPhlAnSGB_202403.md5 mpa_vOct22_CHOCOPhlAnSGB_202403.tar

$ ls uniref/

uniref90_annotated_v201901b_full.tar.gz

$ ls utility_mapping/

full_mapping_v4_alpha.tar.gz

And this is the command I was trying to run:

$humann -r -i ../examples/demo.fastq.gz -o ../SAMPLE/ --threads 16 --protein-database uniref --nucleotide-database chocophlan --metaphlan-options “-t rel_ab_w_read_stats --bowtie2db vOct22 --index mpa_vOct22_CHOCOPhlAnSGB_202403”

Thanks!

Faezeh

This is how the file structure looks on my system:

$ tree --filelimit=10 data/databases/humann/
data/databases/humann/
├── chocophlan [30094 entries exceeds filelimit, not opening dir]
├── uniref
│   └── humann4_protein_database_filtered_v2019_06.dmnd
└── utility_mapping [21 entries exceeds filelimit, not opening dir]

$ ls data/databases/humann/chocophlan/ | head
SGB10000_pangenome90.fna.gz
SGB100018_pangenome90.fna.gz
SGB10001_pangenome90.fna.gz
SGB10002_pangenome90.fna.gz
SGB10003_pangenome90.fna.gz
SGB10004_pangenome90.fna.gz
SGB10005_pangenome90.fna.gz
SGB10006_pangenome90.fna.gz
SGB100078_pangenome90.fna.gz
SGB100087_pangenome90.fna.gz

$ ls data/databases/humann/utility_mapping/
map_eggnog_name.txt.gz      map_ko_uniref90.txt.gz          map_pfam_name.txt.gz                                   metacyc_reactions_level4ec_only.uniref.bz2   vOct22_SGB_mapping.tsv
map_eggnog_uniref90.txt.gz  map_level4ec_name.txt.gz        map_pfam_uniref90.txt.gz                               mpa_vJan21_CHOCOPhlAnSGB_202103.tsv
map_go_name.txt.gz          map_level4ec_uniclust90.txt.gz  map_uniclust50_uniclust90.txt.gz                       mpa_vOct22_CHOCOPhlAnSGB_202403_species.txt
map_go_uniref90.txt.gz      map_metacyc-pwy_name.txt.gz     map_uniref90_name.txt.bz2                              unipathway_pathways
map_ko_name.txt.gz          map_metacyc-rxn_name.txt.gz     metacyc_pathways_structured_filtered_v24_subreactions  unipathway_uniprots.uniref.bz2

In my utility_mapping subdirectory, I see the mpa_vJan21_CHOCOPhlAnSGB_202103.tsv file instead of mpa_vOct22_CHOCOPhlAnSGB_202403.tsv, which I’d expected from the vOct22 CHOCOPhlAn SGB database. I’ve raised that potential issue in a separate thread with the maintainers, but there’s been no confirmation or fix so far.

Do you also have this mpa_vJan21_CHOCOPhlAnSGB_202103.tsv file in your utility_mapping subdirectory, or does your setup match the vOct22 naming instead?

Thank you so much @fmerinocasallo for detailed and prompt response! Can’t express my appreciation enough! With your help, I could solve the error I was receiving. Now I get the following error:

It seems that you do not have Internet access.\nERROR: Cannot find a local database. Please run MetaPhlAn using option “-x <database_name>”

Is it because our server is blocking biobakery website? Do we need internet access for running humann even if we have downloaded all the databases?

Again, I am using this command:

humann -r -i ../examples/demo.fastq.gz -o ../SAMPLE/ --threads 16 --protein-database uniref --nucleotide-database chocophlan --metaphlan-options “-t rel_ab_w_read_stats --bowtie2db vOct22 --index mpa_vOct22_CHOCOPhlAnSGB_202403”

Regarding your issue, @Faeze_Darbaniyan, I’d suggest you to review the Database section in the HUMAnN repo.

What do you get when running humann_databases --available?

More importantly, based on the Configuration section in the HUMAnN repo, what output do you get from humann_config --print?

The entries for database_folders : nucleotide and database_folders : protein should point to directories that look similar to the following:

$ tree --filelimit=10 data/databases/humann/
data/databases/humann/
├── chocophlan [30094 entries exceeds filelimit, not opening dir]
├── uniref
│   └── humann4_protein_database_filtered_v2019_06.dmnd

$ ls data/databases/humann/chocophlan/ | head
SGB10000_pangenome90.fna.gz
SGB100018_pangenome90.fna.gz
SGB10001_pangenome90.fna.gz
SGB10002_pangenome90.fna.gz
SGB10003_pangenome90.fna.gz
SGB10004_pangenome90.fna.gz
SGB10005_pangenome90.fna.gz
SGB10006_pangenome90.fna.gz
SGB100078_pangenome90.fna.gz
SGB100087_pangenome90.fna.gz

You can update configuration values using: humann_config --update $SECTION $NAME $VALUE. For example:

$ humann_config --update database_folders nucleotide $NEW_PATH_TO_CHOCOPHLAN_DB

Alternatively, you can specify database paths directly when running humann using:

These options let you manually provide the locations of the ChocoPhlAn and UniRef databases.


Note: I’m not a heavy HUMAnN user, so please treat this advice with a grain of salt. There might be inaccuracies or things I’m overlooking. Still, I hope it helps point you in the right direction!


Admin note: This discussion may be diverging from the original topic.
@sagunmaharjann @franzosa : could you please consider moving these posts into a new thread to keep things organized per the forum guidelines? Thank you!

Thanks and apologies if it is not related to this topic. Still I believe it relates to databases…

I had updated the databases and “$ humann_config --print” points me the right directory which has files that you guided me to. However, here is the output of what you asked for:

$ humann_databases --available

HUMAnN Databases ( database : build = location )

chocophlan : full = http://huttenhower.sph.harvard.edu/humann_data/chocophlan/chocophlan.v4_alpha.tar.gz

chocophlan : ec_filtered = http://huttenhower.sph.harvard.edu/humann_data/chocophlan/chocophlan_EC_FILTERED.v4_alpha.tar.gz

uniref : uniref90_ec_filtered_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_ec_filtered/uniref90_annotated_v4_alpha_ec_filtered.tar.gz

utility_mapping : full = http://huttenhower.sph.harvard.edu/humann_data/full_mapping_v4_alpha.tar.gz

Is it what you get when hitting this command?

Thanks again!

Faezeh