Hi,
I am also looking to download the SGB representative genomes for the vOct22_202403 SGB release as I’m using MetaPhlAn4.1 and HUMAnN4.0.0.alpha. I’ve read in another post (MAG sequences used in MetaPhlAn 4 data base - Microbial community profiling / MetaPhlAn - The bioBakery help forum) that there are plans for sharing the genomes in future CHOCOPhlAn(SGB) releases, but looks like this is still underway.
In the meantime, I’ve been trying to work backwards by mapping the SGB_IDs to NCBI taxonomy ID to help identify the genomes for downloading. I found the file http://cmprod1.cibio.unitn.it/databases/PhyloPhlAn/SGB.Oct22.tar, inside which has SGB.Oct22.txt.bz2 that have the SGB_IDs to assigned NCBI taxonomy ID information, but I’ve noticed some inconsistencies.
For example, in SGB.Oct22.txt.bz2, the SGB_ID 15286 is assigned to taxID 2086273:
- k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Subdoligranulum|s__Subdoligranulum_sp_APC924_74|t__SGB1528
but in the mpa_vOct22_CHOCOPhlAnSGB_202403_species.txt.bz2, this SGB 15286 is assigned to
- k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Ruminococcaceae|g__Candidatus_Cibionibacter|s__Candidatus_Cibionibacter_quicibialis
I understand that the taxonomies evolve between releases, but since the mpa_vOct22 file doesn’t have the taxIDs I cannot confirm what has changed. I searched for “Candidatus Cibionibacter quicibialis” on NCBI Taxonomy but got no results, the closest match was instead for Candidatus Cibiobacter qucibialis (taxID: 2500537).
My questions
- Is the PhyloPhlAn/SGB.Oct22.tar file the same SGB features used for input to generate the MetaPhlAn species marker reference database and also the CHOCOPhlAn pan-genomes database used by HUMANnN4?
- The modified date for PhyloPhlAn/SGB.Oct22. was 2024-01-17, so I assumed that it should correspond with the updated mpa_vOct22_202403 release, maybe this is wrong?
- Is the “202403” in the mpa_v* filenames referring to the processing date? I’m getting confused between the two dates in the filenames (vOct22)_(202403).
- Can you please let me know what NCBI taxonomy release was used in the mpa_vOct22_202403 related database files?
Much appreciated!!