Offer Pre-built Databases Organised By Biome

Dario · August 23, 2022, 6:00am

I saw that “Sequences (fasta files; due to large size they are split into 5 parts)” at Species-level genome bins (SGBs) from the human microbiome. Could they be split by biome instead? For example, I am interested in analysing oral cavity cancer whole genome sequencing data (the subset of short Illumina reads which are not mapped to the human reference genome), so the database I would like to use does not need to have, for instance, marine bacteria in it. Ideally, I could just download a Human Oral database and work with that for this project. Is that feasible or not to offer?

Also, are there any details public yet about how mpa_vJan21_CHOCOPhlAnSGB_202103 was constructed? It would be great if users could reproduce creating CHOCOPhlAn from scratch.

aitor.blancomiguez · February 7, 2023, 8:54am

Hi @Dario
Unfortunately, due to the size of the data, it will be unfeasible for us to re-upload the data splitted by biome. However, you should be able to filter the MAGs assembled from oral samples from the full dataset using the supplementary table 1 in the original publication: Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle: Cell
The MAG identifiers should follow the structure: StudyID__SampleID__binID
You can find the details of how the MetaPhlAn 4 database vJan21 was built in the following preprint: https://www.biorxiv.org/content/10.1101/2022.08.22.504593v1

Dario · February 10, 2023, 8:00am

Actually, that preprint contains zero instances of the word CHOCOPhlAn. Is it somewhere else?

aitor.blancomiguez · February 21, 2023, 1:18pm

Hi @Dario
While it is not called CHOCOPhlAn anymore, the methods of the preprint describe the whole procedure from genomes to marker genes.

Topic		Replies	Views
cFMD dataset in ChocoPhlAn MetaPhlAn	1	34	January 17, 2025
Downloading metaphlan4 SGB genomes MetaPhlAn	8	276	September 9, 2025
About the Metaphlan4 reference genome MetaPhlAn	5	941	July 3, 2023
MetaPhlAn 4 published + database update MetaPhlAn	15	6482	November 22, 2023
MAG sequences used in MetaPhlAn 4 data base MetaPhlAn	3	665	May 9, 2023

Offer Pre-built Databases Organised By Biome

Related topics