Custom MetaPhlAn subspecies taxonomy + corresponding HUMAnN ChocoPhlAn pangenome design

Hi bioBakery team,

I am working on a project where I want to achieve subspecies-level resolution in HUMAnN, specifically within Bifidobacterium longum (subsp. infantis and subsp. longum).

I have already modified the MetaPhlAn marker database to distinguish these subspecies successfully at the taxonomic profiling level.

I am running HUMAnN with a custom MetaPhlAn database and passing MetaPhlAn options through HUMAnN. However, I am encountering an error related to missing abundance/coverage fields in the MetaPhlAn taxonomic profile.

humann -i sample1_interleaved.fastq.gz -o output/ --memory-use maximum --remove-temp-output --metaphlan-options “-x mpa_vOct22_CHOCOPhlAnSGB_202403 --bowtie2db customdb/ -t rel_ab_w_read_stats” and error recieved:

ERROR: The relative abundance and coverage were not found in the MetaPhlAn taxonomic profile.
Please run MetaPhlAn with the option(s): -t rel_ab_w_read_stats.

However, I would like to understand the correct and supported way to make HUMAnN functional profiling consistent with this custom taxonomy.

My questions are:

  1. HUMAnN documentation states that ChocoPhlAn is a pangenome database used for nucleotide-level mapping.
    If I introduce new taxa (e.g., subspecies) in MetaPhlAn, what is the correct way to extend or rebuild ChocoPhlAn accordingly?

  2. Is there a recommended pipeline to:

    • define subspecies-level pangenomes

    • generate corresponding gene families

    • ensure compatibility with HUMAnN naming conventions?

  3. Does HUMAnN strictly require ChocoPhlAn entries to match MetaPhlAn taxonomic labels one-to-one, or can multiple pangenomes map to a single species-level entry?

  4. Are there any internal tools or workflows used by the bioBakery team to regenerate ChocoPhlAn when taxonomy is modified?

@franzosa ? Kindly answer the query