What’s is needed to build a custom HUMANN database?

jolespin · August 30, 2023, 10:14pm

Finally circling around to this.

I’m taking a look here: GitHub - biobakery/humann: HUMAnN is the next generation of HUMAnN 1.0 (HMP Unified Metabolic Analysis Network).

The MetaPhlAn marker database is a subset of the genes that get included in the ChocoPhlAn (pangenome) database. Specifically the markers are genes within a pangenome that are core to the genomes in that pangenome (i.e. found in all of them) and unique to the genomes in that pangenome (i.e. not found in other pangenomes). In practice you might not get enough markers that are 100% core and 100% unique, so the goal is to have a few 100 that are as core and as unique as possible and then do a robust average over them.

This is going to be tricky. I’ve clustered all of the proteins w/in a species cluster but I guess I’ll need to cluster those representatives to see if they are unique to the cluster.

I have a few follow up questions:

Is there any code available for how the default HUMAnN, Metaphlan, and Chocophlan databases were created? I’ve seen this post but there weren’t any responses: Chocophlan source code
Is it preferred to have a Metaphlan and Chocophlan database when running HUMAnN or can you get comparable results using just the proteins?
Should we expect to have a 1-to-1 relationship between the protein and nucleotide sequences?

I’d like to get started on this but I’m just a little confused on where to start exactly and which resources to follow to generate a fully operational custom HUMAnN and Metaphlan database.

Topic		Replies	Views
Constructing custom database de novo MetaPhlAn	2	843	July 22, 2022
Thoughts on custom humann3 reference databases HUMAnN	7	1831	April 3, 2023
Humann3/Chocophlan and metaphlan3 databases compatible? HUMAnN	4	1438	December 1, 2022
Building a custom marker genes DB for running Metaphlan MetaPhlAn	1	458	July 18, 2022
ChocoPhlAn/UniRef 201901b vs 201901 HUMAnN	3	1146	September 3, 2021

What’s is needed to build a custom HUMANN database?

Related topics