Discrepancy in taxonomy between metaphlan and chocophlan

Hi,

I recently ran humann 3.8 and noticed discrepancies between the community profile computed by metaphlan and the functional profile computed by chocophlan.
It seems to me that metaphlan is using more up-to-date taxonomy, eg. Bacteroides vulgatus is already called Phocaeicola vulgatus. In the functional profile it is still called Bacteroides vulgatus (and so are the corresponding pangenomes in the chocophlan database).

I tried to update the chocophlan database with:
DBDIR=/lisc/scratch/mirror/humann/3.8

humann_config --update database_folders nucleotide $DBDIR/chocophlan
HUMAnN configuration file updated: database_folders : nucleotide = /lisc/scratch/mirror/humann/3.8/chocophlan

humann_databases --download chocophlan full $DBDIR --update-config yes
Download URL: http://huttenhower.sph.harvard.edu/humann_data/chocophlan/full_chocophlan.v201901_v31.tar.gz

which again downloads an old database.

Is there any way to get an updated version of the chocophlan pangenomes?

Thank you for your help!

Best,
Franziska

1 Like

And here is also my output for humann-databases --available:

chocophlan : full = http://huttenhower.sph.harvard.edu/humann_data/chocophlan/full_chocophlan.v201901_v31.tar.gz
chocophlan : DEMO = http://huttenhower.sph.harvard.edu/humann_data/chocophlan/DEMO_chocophlan.v201901_v31.tar.gz
uniref : uniref50_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref50_annotated_v201901b_full.tar.gz
uniref : uniref90_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref90_annotated_v201901b_full.tar.gz
uniref : uniref50_ec_filtered_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_ec_filtered/uniref50_ec_filtered_201901b_subset.tar.gz
uniref : uniref90_ec_filtered_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_ec_filtered/uniref90_ec_filtered_201901b_subset.tar.gz
uniref : DEMO_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref90_DEMO_diamond_v201901b.tar.gz
utility_mapping : full = http://huttenhower.sph.harvard.edu/humann_data/full_mapping_v201901b.tar.gz

Indeed, if you’re using MetaPhlAn 4 with HUMAnN 3.8, then MetaPhlAn 4 is using a more up-to-date taxonomy. We use a mapping file to relate MetaPhlAn 4 taxa names to HUMAnN 3.8 (pangenome) taxa names. This mapping file is located under humann/humann/data/misc in your install, but the relevant mappings are also quoted in each sample’s log file.

You could, if desired, update the HUMAnN output to match the MetaPhlAn 4 taxonomy based on these mappings, but keep in mind that they are not always 1-to-1: sometimes a MetaPhlAn 4 taxon (SGB) represents a merging of 2+ HUMAnN pangenomes, while other times a HUMAnN pangenome is now known to be a merging of 2+ SGBs.

Thank you for clarifying!

I think for my purposes it’s then better to go back to metaphlan 3 to have compatible outputs.
(metaphlan 3.1.0 with the corresponding mpa_latest)

Or would you recommend to use metaphlan 4.1.1, but with the older database from metaphlan 3.1.0 (and the flag --mp3, I presume)?

Thank you very much for your help!