Hello,
I apologise if this is a silly question! I was hoping to pass the output of merge_metaphlan_tables.py to hclust2.py, including metadata in the process. However, the inclusion of metadata seems to confuse the script and I think this may be due to the way headers are parsed. Is there a way to resolve this?
I have 4 microbiome cohorts, and previously (successfully) generated 4 merged reports using merge_metaphlan_tables.py. Each of the output files only had a single header. These work with hclust2.py to create heatmaps.
However, now I would like to merge the 4 tables and include metadata in the process for plotting in hclust2. Edited to clarify: I am trying to use merge_metaphlan_tables.py on files that I created using merge_metaphlan_tables.py, then subsequently modified. I am merging merged files. I added metadata to each of the 4 reports in the format shown on the hclust2.py GitHub
In my case that would look like [snippet]
Individual DM21_2 DM23_1 DM24_1 DM25_1 DM26_1 DM26_2 DM26_3 DM27_1 DM31_1 DM32_1 DM32_3 DM32_4 DM34_1 DM35_1 DM35_2 DM35_3 DM37_1 DM38_1 DM38_2 DM39_1 DN20_1 YM11_1 YM1_1 YM13_1 YM15_1 YM16_1 YM17_1 YM19_1 YM21_1 YM2_1 YM22_1 YM23_1 YM24_1 YM3_1 YM4_1 YM6_1 YM7_1 YM7_2 YM8_1 YN14_1 YN24_2 YN8_1
Site Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Nares Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Nares Nares Nares
Age group Dairy Dairy Dairy Dairy Dairy Dairy Dairy Dairy Dairy Dairy Dairy Dairy Dairy Dairy Dairy Dairy Dairy Dairy Dairy Dairy Dairy Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock
Xanthomonas_euvesicatoria 37.4288 51.8028 59.8592 59.8055 40.2174 42.1588 10.4006 5.59846 64.4851 67.8074 60.3916 23.2693 34.1781 49.4456 68.9458 64.8838 20.8 59.6793 17.2165 66.4789 44.1335 48.5576 12.4053 61.595 8.50568 57.5442 31.6299 0.765697 0 45.1309 46.528 53.0928 64.1773 2.88021 0 11.4263 9.3268 18.5632 24.5612 1.15072 58.7875 3.56786
Xanthomonas_citri 0.0335233 0.0330797 0 0 0.047259 0 0 0 0 0.00339512 0 0.00839137 0 0.00908843 0 0 0 0.0176201 0 0.0201207 0 0 0 0.0165289 0 0.0242189 0 0 0 0 0 0 0 0.00353401 0.0791139 0 0.01021550 0.00659892 0 0 0.0099996
Xanthomonas_translucens 0.0502849 0 0 0 0 0 0.0143062 0 0.010226 0.0101854 0 0.00839137 0 0.00908843 0.0224542 0 00 0 0 0 0 0.0932727 0 0 0 0 0 0 0 0 0 0 0.00353401 0 0.00634618 0 0 0 0.000818438 0 0.0139994
for file 1
Individual M10 M11 M12 M13 M13R M14 M15 M16 M17 M19 M1 M2 M3 M4 M5 M6 M7 M8 M9 N13 N18 N2 N6 N7
Site Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Meatus Nares Nares Nares Nares Nares
Age group Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock Youngstock
Xanthomonas_euvesicatoria 30.3162 4.38532 15.8741 27.9875 24.6415 8.3874 6.59714 12.7262 9.22088 27.9032 21.44 20.626 7.29517 21.9067 5.074 17.8708 13.4624 20.5465 25.1368 2.09418 9.28736 6.21498 6.52759 6.76382
Xanthomonas_oryzae 0.0408942 0 0 0.0552384 0.0372509 0.0154464 0.00635869 0.0397693 0.0443312 0.00509554 0.0356443 0 0.0224467 0.0361795 0 0 0.0225124 0.0257798 0.0156323 0.00760138 0 0.00394602 0 0.00549904
for file 2
et cetera. Obviously the species in each of the 4 files are not identical; I assume this is not an issue and it either takes just shared species or plots NA values for non-shared ones.
The problem: when I try to pass these 4 files to merge_metaphlan_tables.py, I get the error
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/metaphlan/bin/merge_metaphlan_tables.py", line 8, in <module>
sys.exit(main())
File "/home/ubuntu/anaconda3/envs/metaphlan/lib/python3.7/site-packages/metaphlan/utils/merge_metaphlan_tables.py", line 73, in main
merge(args.aistms, open(args.o, 'w') if args.o else sys.stdout, args.gtdb_profiles)
File "/home/ubuntu/anaconda3/envs/metaphlan/lib/python3.7/site-packages/metaphlan/utils/merge_metaphlan_tables.py", line 27, in merge
listmpaVersion.add(headers[0])
IndexError: list index out of range
I suspect from what I can understand from the script that this is a header issue, and that merge_metaphlan_tables.py is not expecting to see the metadata. However, the GitHub for hclust2.py implies that it should be possible to include metadata for visualisation. Is this a possibility? I couldn’t find it on the tutorials, but I apologise if I missed it. Thank you for any help!
P.S. I am using version 4.0.6.