Hi bioBakery team,
I am working on a project where I want to achieve subspecies-level resolution in HUMAnN, specifically within Bifidobacterium longum (subsp. infantis and subsp. longum).
I have already modified the MetaPhlAn marker database to distinguish these subspecies successfully at the taxonomic profiling level.
I am running HUMAnN with a custom MetaPhlAn database and passing MetaPhlAn options through HUMAnN. However, I am encountering an error related to missing abundance/coverage fields in the MetaPhlAn taxonomic profile.
humann -i sample1_interleaved.fastq.gz -o output/ --memory-use maximum --remove-temp-output --metaphlan-options “-x mpa_vOct22_CHOCOPhlAnSGB_202403 --bowtie2db customdb/ -t rel_ab_w_read_stats” and error recieved:
ERROR: The relative abundance and coverage were not found in the MetaPhlAn taxonomic profile.
Please run MetaPhlAn with the option(s): -t rel_ab_w_read_stats.
However, I would like to understand the correct and supported way to make HUMAnN functional profiling consistent with this custom taxonomy.
My questions are:
-
HUMAnN documentation states that ChocoPhlAn is a pangenome database used for nucleotide-level mapping.
If I introduce new taxa (e.g., subspecies) in MetaPhlAn, what is the correct way to extend or rebuild ChocoPhlAn accordingly? -
Is there a recommended pipeline to:
-
define subspecies-level pangenomes
-
generate corresponding gene families
-
ensure compatibility with HUMAnN naming conventions?
-
-
Does HUMAnN strictly require ChocoPhlAn entries to match MetaPhlAn taxonomic labels one-to-one, or can multiple pangenomes map to a single species-level entry?
-
Are there any internal tools or workflows used by the bioBakery team to regenerate ChocoPhlAn when taxonomy is modified?