Metaphlan: sgb_to_gtdb_profile.py fails with KeyError

MetaPhlAn version 4.1.1 (11 Mar 2024)
(installed from bioconda)

I profiled my metagenome and successfully created the metaphlan profile file, but got this warning message in the process:

WARNING: The metagenome profile contains clades that represent multiple species merged into a single representant.

I then tried to convert to gtdb taxonomy (for a comparison) but received this error message:

Mon Jun 2 15:36:05 2025: Start execution
Traceback (most recent call last):
File “/home/ubuntu/miniforge3/envs/tax_class/bin/sgb_to_gtdb_profile.py”, line 10, in
sys.exit(main())
^^^^^^
File “/home/ubuntu/miniforge3/envs/tax_class/lib/python3.12/site-packages/metaphlan/utils/sgb_to_gtdb_profile.py”, line 97, in main
get_gtdb_profile(args.input, args.output, database_controller.get_database_name())
File “/home/ubuntu/miniforge3/envs/tax_class/lib/python3.12/site-packages/metaphlan/utils/sgb_to_gtdb_profile.py”, line 72, in get_gtdb_profile
gtdb_tax = sgb2gtdb[line[0].split(‘|’)[-1][3:]]
~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: ‘SGB10139_group’

I tried re-installing the database and I think I have the most recent version:
mpa_vJan25_CHOCOPhlAnSGB_202503…

How can I convert the taxonomy to gtdb?
Thanks.

Hi @jtrachsel
Can you check that the database specified in the sgb_to_gtdb_profile.py corresponds to the database you used to run MetaPhlAn?
For the sgb_to_gtdb_profile.py you should find this info at the beginning of the file (defined as variable GTDB_ASSIGNMENT_FILE), while for the metaphlan profiles it’s in the header of the profile.

Hi @Claudia_Mengoni,and biobakers,

i biobakers,

I got this error when running:

sgb_to_gtdb_profile.py -i S_10_profiled_metagenome.txt -o S_10_profiled_metagenome_gtdb.txt

Output:

Tue Oct 21 20:41:17 2025: Start execution
Traceback (most recent call last):
  File "/work/FAC/FBM/DBC/slehtine/stool_sampling/tools/mpa4.2.2/bin/sgb_to_gtdb_profile.py", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/work/FAC/FBM/DBC/slehtine/stool_sampling/tools/mpa4.2.2/lib/python3.12/site-packages/metaphlan/utils/sgb_to_gtdb_profile.py", line 94, in main
    get_gtdb_profile(args.input, args.output)
  File "/work/FAC/FBM/DBC/slehtine/stool_sampling/tools/mpa4.2.2/lib/python3.12/site-packages/metaphlan/utils/sgb_to_gtdb_profile.py", line 71, in get_gtdb_profile
    gtdb_tax = sgb2gtdb[line[0].split('|')[-1][3:]]
               ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'SGB8007_group'

Output file preview:

I am using the latest metphlan4 humann4 compatible database…

(/work/FAC/FBM/DBC/slehtine/stool_sampling/tools/mpa4.2.2)
head S_10_profiled_metagenome_gtdb.txt
#mpa_vOct22_CHOCOPhlAnSGB_202403
#clade_name     relative_abundance
UNCLASSIFIED    15.33515

MetaPhlAn version:

metaphlan --version
MetaPhlAn version 4.2.2 (4 Jun 2025)

Environment variable:

echo $GTDB_ASSIGNMENT_FILE

Thanks for the support !

Hi @fconstancias

Since you’re using the v4.2.2 you will see in the sgb_to_gtdb_profile.py that the version hardcoded to work is the vJan25 (see first lines of the file /work/FAC/FBM/DBC/slehtine/stool_sampling/tools/mpa4.2.2/lib/python3.12/site-packages/metaphlan/utils/sgb_to_gtdb_profile.py) We should modify the script for it to be able to detect the version from the profile, meanwhile a quick fix would be to modify the variable in the script yourself (specifically with GTDB_ASSIGNMENT_FILE = os.path.join(os.path.dirname(os.path.abspath(
file)), “mpa_vOct22_CHOCOPhlAnSGB_202212_SGB2GTDB.tsv”) )