Including metadata in merge_metaphlan_tables.py - can it be done?

Hello,

I apologise if this is a silly question! I was hoping to pass the output of merge_metaphlan_tables.py to hclust2.py, including metadata in the process. However, the inclusion of metadata seems to confuse the script and I think this may be due to the way headers are parsed. Is there a way to resolve this?

I have 4 microbiome cohorts, and previously (successfully) generated 4 merged reports using merge_metaphlan_tables.py. Each of the output files only had a single header. These work with hclust2.py to create heatmaps.

However, now I would like to merge the 4 tables and include metadata in the process for plotting in hclust2. Edited to clarify: I am trying to use merge_metaphlan_tables.py on files that I created using merge_metaphlan_tables.py, then subsequently modified. I am merging merged files. I added metadata to each of the 4 reports in the format shown on the hclust2.py GitHub

In my case that would look like [snippet]

Individual      DM21_2  DM23_1  DM24_1  DM25_1  DM26_1  DM26_2  DM26_3  DM27_1  DM31_1  DM32_1  DM32_3  DM32_4  DM34_1  DM35_1  DM35_2  DM35_3  DM37_1  DM38_1  DM38_2  DM39_1  DN20_1  YM11_1  YM1_1   YM13_1  YM15_1   YM16_1  YM17_1  YM19_1  YM21_1  YM2_1   YM22_1  YM23_1  YM24_1  YM3_1   YM4_1   YM6_1   YM7_1   YM7_2   YM8_1   YN14_1  YN24_2  YN8_1
Site    Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Nares   Meatus  Meatus  Meatus  Meatus  Meatus   Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Nares   Nares   Nares
Age group       Dairy   Dairy   Dairy   Dairy   Dairy   Dairy   Dairy   Dairy   Dairy   Dairy   Dairy   Dairy   Dairy   Dairy   Dairy   Dairy   Dairy   Dairy   Dairy   Dairy   Dairy   Youngstock      Youngstock       Youngstock      Youngstock      Youngstock      Youngstock      Youngstock      Youngstock      Youngstock      Youngstock      Youngstock      Youngstock      Youngstock      Youngstock      Youngstock       Youngstock      Youngstock      Youngstock      Youngstock      Youngstock      Youngstock
Xanthomonas_euvesicatoria       37.4288 51.8028 59.8592 59.8055 40.2174 42.1588 10.4006 5.59846 64.4851 67.8074 60.3916 23.2693 34.1781 49.4456 68.9458 64.8838 20.8    59.6793 17.2165 66.4789 44.1335 48.5576 12.4053  61.595  8.50568 57.5442 31.6299 0.765697        0       45.1309 46.528  53.0928 64.1773 2.88021 0       11.4263 9.3268  18.5632 24.5612 1.15072 58.7875 3.56786
Xanthomonas_citri       0.0335233       0.0330797       0       0       0.047259        0       0       0       0       0.00339512      0       0.00839137      0       0.00908843      0       0       0       0.0176201        0       0.0201207       0       0       0       0.0165289       0       0.0242189       0       0       0       0       0       0       0       0.00353401      0.0791139       0       0.01021550       0.00659892      0       0       0.0099996
Xanthomonas_translucens 0.0502849       0       0       0       0       0       0.0143062       0       0.010226        0.0101854       0       0.00839137      0       0.00908843      0.0224542       0       00       0       0       0       0       0.0932727       0       0       0       0       0       0       0       0       0       0       0.00353401      0       0.00634618      0       0       0       0.000818438      0       0.0139994

for file 1


Individual      M10     M11     M12     M13     M13R    M14     M15     M16     M17     M19     M1      M2      M3      M4      M5      M6      M7      M8      M9      N13     N18     N2      N6      N7
Site    Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Meatus  Nares   Nares   Nares   Nares   Nares
Age group       Youngstock      Youngstock      Youngstock      Youngstock      Youngstock      Youngstock      Youngstock      Youngstock      Youngstock      Youngstock      Youngstock      Youngstock      Youngstock       Youngstock      Youngstock      Youngstock      Youngstock      Youngstock      Youngstock      Youngstock      Youngstock      Youngstock      Youngstock      Youngstock
Xanthomonas_euvesicatoria       30.3162 4.38532 15.8741 27.9875 24.6415 8.3874  6.59714 12.7262 9.22088 27.9032 21.44   20.626  7.29517 21.9067 5.074   17.8708 13.4624 20.5465 25.1368 2.09418 9.28736 6.21498 6.52759  6.76382
Xanthomonas_oryzae      0.0408942       0       0       0.0552384       0.0372509       0.0154464       0.00635869      0.0397693       0.0443312       0.00509554      0.0356443       0       0.0224467       0.0361795        0       0       0.0225124       0.0257798       0.0156323       0.00760138      0       0.00394602      0       0.00549904

for file 2

et cetera. Obviously the species in each of the 4 files are not identical; I assume this is not an issue and it either takes just shared species or plots NA values for non-shared ones.

The problem: when I try to pass these 4 files to merge_metaphlan_tables.py, I get the error

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/metaphlan/bin/merge_metaphlan_tables.py", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/anaconda3/envs/metaphlan/lib/python3.7/site-packages/metaphlan/utils/merge_metaphlan_tables.py", line 73, in main
    merge(args.aistms, open(args.o, 'w') if args.o else sys.stdout, args.gtdb_profiles)
  File "/home/ubuntu/anaconda3/envs/metaphlan/lib/python3.7/site-packages/metaphlan/utils/merge_metaphlan_tables.py", line 27, in merge
    listmpaVersion.add(headers[0])
IndexError: list index out of range

I suspect from what I can understand from the script that this is a header issue, and that merge_metaphlan_tables.py is not expecting to see the metadata. However, the GitHub for hclust2.py implies that it should be possible to include metadata for visualisation. Is this a possibility? I couldn’t find it on the tutorials, but I apologise if I missed it. Thank you for any help!

P.S. I am using version 4.0.6. :slight_smile:

Hi @mortuseon
So if I understand it correctly, you are adding the metadata to each cohort file and then you are trying to merge them with merge_metaphlan_tables.py? In this case that is not possible, you should first merge all the original metaphlan profiles (the 4 cohorts) and the add the metadata