Missing `mpa_vOct22_CHOCOPhlAnSGB_202403.tsv` in HUMAnN 4 utility_mapping database

Hello, fellow bioBakers! :waving_hand:

I’m currently working with HUMAnN v4.0.0.alpha.1 and MetaPhlAn v4.1.1 (11 Mar 2024) for metatranscriptomics analysis, and I’ve encountered an issue with the utility mapping files.

Problem Description

I attempted to download the utility_mapping database using:

humann_databases --download utility_mapping full data/databases/humann/

However, the automated download failed due to our institutional network blocking Globus domains (see this thread for details). I had to manually download the full_mapping_v4_alpha.tar.gz file from an external server.

After extracting the manually downloaded archives, I can see the following files in the utility_mapping directory:

  • mpa_vJan21_CHOCOPhlAnSGB_202103.tsv
  • vOct22_SGB_mapping.tsv

However, I’m using MetaPhlAn v4.1.1 with the database mpa_vOct22_CHOCOPhlAnSGB_202403, and I notice that the corresponding mapping file mpa_vOct22_CHOCOPhlAnSGB_202403.tsv is missing from the utility_mapping folder.

I’m not the only one encountering this issue: another user (@fconstancias) reported the same problem in this thread, confirming that they also only see mpa_vJan21_CHOCOPhlAnSGB_202103.tsv and vOct22_SGB_mapping.tsv files, but not the mpa_vOct22_CHOCOPhlAnSGB_202403.tsv file needed for MetaPhlAn v4.1.1 compatibility.

What I’ve Tried

  1. Verified the download was complete by checking the contents of full_mapping_v4_alpha.tar.gz (downloaded manually due to network restrictions - see note below)
  2. Confirmed that the file is not present in the extracted utility_mapping directory
  3. Reviewed the humann_databases.py script to ensure there’s no post-processing step that should generate this file

Note: I had to download the database manually from an external server because our institutional network blocks Globus domains (see this thread for details). However, this manual download confirmed that the file is genuinely missing from the package contents, so this is a separate issue from the download accessibility problem.

Question / Feature Request

I would like to clarify whether the mpa_vOct22_CHOCOPhlAnSGB_202403.tsv file:

  • Is intentionally excluded from the full_mapping_v4_alpha.tar.gz package?
  • Is available for download from a different location?
  • Is generated by HUMAnN during runtime from other mapping files?

If this file is missing from the archive, I would like to request that it be added to future releases of the utility_mapping database to ensure compatibility with MetaPhlAn v4.1.1 and the mpa_vOct22_CHOCOPhlAnSGB_202403 database. Alternatively, if there’s a way to generate or obtain this file, guidance would be appreciated.

Environment Details

  • HUMAnN version: v4.0.0.alpha.1
  • MetaPhlAn version: v4.1.1 (11 Mar 2024)
  • MetaPhlAn database: mpa_vOct22_CHOCOPhlAnSGB_202403
  • Database source: Downloaded from http://huttenhower.sph.harvard.edu/humann_data/full_mapping_v4_alpha.tar.gz

Any guidance or clarification would be greatly appreciated!

Best regards,
Fran :man_technologist:

1 Like