Different chocophlan databases?

BioinformaticsLad · April 23, 2020, 8:12am

I’d like to know what the difference is between the chocophlan database hosted here: https://bitbucket.org/biobakery/metaphlan2/downloads/ (this is 366Mb)

And the one that is downloaded via the command line:

humann2_databases --download chocophlan full $INSTALL_LOCATION

(this one is 5.5Gb)

I mean, other than the obvious difference in size, what is each one used for? They’re both called chocophlan, so I’m guessing they’re both used for running metaphlan.

fbeghini · April 23, 2020, 9:10am

Hi,
ChocoPhlAn is the underlying pipeline that builds the species pan-genomes, it identifies from them the MetaPhlAn makers (the 366Mb file in the MetaPhlAn repository), the HUMAnN centroids and functional annotation (the file retrieved with humann2_databases --download chocophlan)

BioinformaticsLad · April 24, 2020, 12:57am

Thanks! Although I’m still not understanding.

I thought the pipeline was called metaphlan and chocophlan was the database (containing species-specific markers), or am I mistaken?

I thought the humann centroids and functional annotation was stored in the UniRef database.

franzosa · April 24, 2020, 2:06am

As @fbeghini said, ChocoPhlAn is the pipeline that builds the pangenomes, and we often refer to the resulting pangenomes as the “ChocoPhlAn database.” The marker genes are a unique, conserved subset of each species’ pangenome, so in total they are a subset of ChocoPhlAn.

UniRef is a clustering of the protein universe maintained and updated by UniProt. The ChocoPhlAn pangenomes were historically mapped against UniRef to identify broader gene families and known functional annotations (the modern ChocoPhlAn pipeline actually uses the UniRef clustering to aid in pangenome construction). Hope this clarifies the remaining confusion!

Shi_Huang · October 18, 2021, 7:14am

I’d like to know if any tools can be used for constructing a custom chocoplan database.
I have a set of 170,000 microbial reference genomes including bacteria, archaea, and fungi. Is there any pipeline (chocoplan pipeline) for general users that can extract markers from each genome and construct a custom marker-gene database? Thank you so much!

jolespin · July 2, 2023, 8:58pm

I’m also interested in this. Have you made any progress? Do you have any suggestions?

franzosa · July 7, 2023, 6:42pm

I believe the third-party systems Struo and Struo2 can help users build custom databases for MetaPhlAn and HUMAnN. They were not developed by us, however, so we would not be able to provide official support for them.

Topic		Replies	Views
What’s is needed to build a custom HUMANN database? HUMAnN	6	1033	August 30, 2023
Custom chocophlan database HUMAnN	3	491	August 8, 2023
Use humann-downloaded chocophlan with metaphlan MetaPhlAn	4	986	September 24, 2020
Inquiry on custom chocophlan database HUMAnN	2	161	November 10, 2023
Discrepancy in taxonomy between metaphlan and chocophlan HUMAnN	3	100	July 19, 2024

Different chocophlan databases?

Related topics