Custom MetaPhlAn subspecies taxonomy + corresponding HUMAnN ChocoPhlAn pangenome design

iqra_saleh · April 28, 2026, 7:26am

Hi bioBakery team,

I am working on a project where I want to achieve subspecies-level resolution in HUMAnN, specifically within Bifidobacterium longum (subsp. infantis and subsp. longum).

I have already modified the MetaPhlAn marker database to distinguish these subspecies successfully at the taxonomic profiling level.

I am running HUMAnN with a custom MetaPhlAn database and passing MetaPhlAn options through HUMAnN. However, I am encountering an error related to missing abundance/coverage fields in the MetaPhlAn taxonomic profile.

humann -i sample1_interleaved.fastq.gz -o output/ --memory-use maximum --remove-temp-output --metaphlan-options “-x mpa_vOct22_CHOCOPhlAnSGB_202403 --bowtie2db customdb/ -t rel_ab_w_read_stats” and error recieved:

ERROR: The relative abundance and coverage were not found in the MetaPhlAn taxonomic profile.
Please run MetaPhlAn with the option(s): -t rel_ab_w_read_stats.

However, I would like to understand the correct and supported way to make HUMAnN functional profiling consistent with this custom taxonomy.

My questions are:

HUMAnN documentation states that ChocoPhlAn is a pangenome database used for nucleotide-level mapping.
If I introduce new taxa (e.g., subspecies) in MetaPhlAn, what is the correct way to extend or rebuild ChocoPhlAn accordingly?
Is there a recommended pipeline to:
- define subspecies-level pangenomes
- generate corresponding gene families
- ensure compatibility with HUMAnN naming conventions?
Does HUMAnN strictly require ChocoPhlAn entries to match MetaPhlAn taxonomic labels one-to-one, or can multiple pangenomes map to a single species-level entry?
Are there any internal tools or workflows used by the bioBakery team to regenerate ChocoPhlAn when taxonomy is modified?

iqra_saleh · April 29, 2026, 11:36am

@franzosa ? Kindly answer the query

franzosa · May 28, 2026, 4:45pm

You would have to build a pangenome in HUMAnN’s format that would match each of your recognized subspecies. You could probably do this by simply subdividing the species’ total pangenome into overlapping subsets representing the different subspecies, but this isn’t something we have any official support for.

If it were me I’d simply profile against the full species pangenome normally and then look at the subsets that light up in relation to the subspecies you believe are present. Usually when there is strong subspecies-level structure you can see it very clearly in the presence/absence patterns of genes within the species pangenome across samples (a big universal gene block corresponding to the species’ core genome + separate blocks that are found in some subspecies but not others).

Topic		Replies	Views
Discrepancy in taxonomy between metaphlan and chocophlan HUMAnN	3	115	July 19, 2024
Cannot run humann v3.7 using the latest Chocophlan database HUMAnN	17	1568	August 2, 2024
MetaPhlAn Format Requirements for Input HUMAnN	3	522	June 24, 2021
Discrepancy between metaphlan3 community profile and humann3 gene families HUMAnN	3	958	November 9, 2021
Metaphlan 4 using with Humann3.9 HUMAnN	5	585	April 21, 2026

Custom MetaPhlAn subspecies taxonomy + corresponding HUMAnN ChocoPhlAn pangenome design

Related topics