Merge_metaphlan_tables.py doesn't work after converting to GTDB taxonomy

The merge_metaphlan_tables.py script doesnt work if when metaphlan output has been converted using the sgb_to_gtdb_profile.py utility script. I suspect this is to do with the lower number of headers and columns.

This is what an example of my converted metaphlan output looks like:

#mpa_vJan21_CHOCOPhlAnSGB_202103
#clade_name     relative_abundance
d__Bacteria     100.00001
d__Bacteria;p__Actinobacteriota 25.292869999999997
d__Bacteria;p__Proteobacteria   12.73789
d__Bacteria;p__Firmicutes_A     38.49073
d__Bacteria;p__Bacteroidota     18.188679999999998
d__Bacteria;p__Firmicutes       4.95885
d__Bacteria;p__Firmicutes_C     0.28645
d__Bacteria;p__Firmicutes_B     0.04454

The error code in the merged output is:
merge_metaphlan_tables: wrong header format for “13-6929606_metaphlan_gtdb.txt”, please check your profiles.

Any ideas on how to fix this?

Hi @sxh1136
Yes, currently the merge_metaphlan_tables.py does not work with the profiles produced by the sgb_to_gtdb_profile.py. However, in the following days, we are going to push version 4.0.2 that will include new utilities and, between them, the merging of gtdb-transformed profiles

Thanks for the quick reply. Good to hear that a fix is coming soon. Would you recommend that I even convert SGB taxonomy to GTDB if I’m just using the metaphlan data to plot relative abundance stacked bar charts?

Hi @sxh1136
We just pushed version 4.0.2 fixing this problem (now, for GTDB profiles you will need to use the --gtdb_profiles parameter)

2 Likes

Hi, what version of GTDB database will this script use to convert the profiling results? Is it GTDB r202?

Hi @Jason
We are using r207

1 Like

Thanks for replying.
Is there any way I can transfer the taxonomy results into a GTDB_r202-based version?
(Maybe the script in a specific MetaPhlAn version can achieve this goal? )

Hi @Jason
Unfortunately, I think there is not an easy way to transfer the taxonomy to r202

Hi Aitor,

Is there a way to convert a merged metaphlan taxonomy file to GTDB? I tried running sgb_to_gtdb_profile.py on the merged table (created by running biobakery_workflows wmgx pipeline), but the result is only a single column, not every sample.

Thanks a ton for your help answering all the questions here, and for the great tools!

Hi @cjharbort
Currently it is not possible, you will have to convert first the individual profiles to GTDB and after that run the merge_metaphlan_tables.py script

1 Like

Hi I am using metaphlan version 4.1.0 (23 Aug 2023) and I converted to my SGB to GTDB taxonomy.

I am getting an error when merging the tables: pandas.errors.ParserError: Defining usecols with out-of-bounds indices is not allowed. [2] are out of bounds.

I looked through each individual gtdb file and two of my samples have relative abundance of 100.00000000000001

Some of them are 99.999999 which I assume is not a problem, and several are exactly 100.0

What can I do to get my tables to merge properly?

Here are the commands I’m using:
sgb_to_gtdb_profile.py -i profiled_SAMPLE_metagenome.txt -o metaphlan_output_SAMPLE_gtdb.txt

merge_metaphlan_tables.py metaphlan_output_*_gtdb.txt > merged_abundance_table.txt

Thank you for your help!

Hi I’ve done more reading about this. Others have had this problem in the past, but it looks like the sbg_to_gtb_profile.py has been updated?

I went to github and downloaded the newest version (edited 3 weeks ago). I tried providing the path to this file and rerunning the command: sgb_to_gtdb_profile.py -i profiled_SAMPLE_metagenome.txt -o metaphlan_output_SAMPLE_gtdb.txt with the path to the new file from github. I could not get it to work.

Any other thoughts?

Here is a link to the thread previously that described others with this problem: sgb_to_gtdb_profile.py -i profiled_SAMPLE_metagenome.txt -o metaphlan_output_SAMPLE_gtdb.txt