MetaPhlAn file formats across versions

Hi all,
I posted a related question recently ( Human and metaphlan file formats), but @franzosa suggested that I check with MetaPhlAn folks under this topic.

I’ve been working on a QIIME 2 plugin to facilitate using HUMANn 3 and MetaPhlAn 3 output with downstream tools in QIIME 2 (taxonomy/functional category plots, ordination, etc). The MetaPhlAn format that I’m specifically focused on at the moment is the MetaPhlAn merged abundance table as illustrated here:

#mpa_v30_CHOCOPhlAn_201901
clade_name	NCBI_tax_id	sample1	sample_2
k__Archaea	2157	9.75907	0.02352
k__Archaea|p__Euryarchaeota	2157|28890	9.75907	0.02352
k__Archaea|p__Euryarchaeota|c__Methanobacteria	2157|28890|183925	9.75907	0.02352
k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales	2157|28890|183925|2158	9.75907	0.02352
k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae	2157|28890|183925|2158|2159	9.75907	0.02352
k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanobrevibacter	2157|28890|183925|2158|2159|2172	9.75907	0.02352
k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanobrevibacter|s__Methanobrevibacter_smithii	2157|28890|183925|2158|2159|2172|2173	9.75907	0.02352
k__Bacteria	2	90.24093	99.97648
k__Bacteria|p__Actinobacteria	2|201174	90.24093	99.97648
k__Bacteria|p__Actinobacteria|c__Actinobacteria	2|201174|1760	90.24093	99.97648
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales	2|201174|1760|2037	90.24093	99.97648
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae	2|201174|1760|2037|2049	90.24093	99.97648
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinobaculum	2|201174|1760|2037|2049|76833	45.0	10.97648
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinobaculum|s__Actinobaculum_sp_oral_taxon_183	2|201174|1760|2037|2049|76833|712888	45.0	10.97648
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces	2|201174|1760|2037|2049|1654	45.24093	89
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_graevenitzii	2|201174|1760|2037|2049|1654|55565	5.0	0.0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_naeslundii	2|201174|1760|2037|2049|1654|1655	1.24093	0.0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_odontolyticus	2|201174|1760|2037|2049|1654|1660	25.0	0.0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_oris	2|201174|1760|2037|2049|1654|544580	10.0	9.0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_sp_HMSC035G02	2|201174|1760|2037|2049|1654|1739406	4.0	80.0

I have a couple of questions related to this work.

First, it sounds like there may be differences in that file format between MetaPhlAn 3 and 4 - is that correct, and if so is there a way to programmatically determine which version of the file I’m working with to ensure accurate parsing?

Second, are there example output files for different versions of the file format around anywhere that I could use in my testing? I’ve pulled a few from my own work and from the docs, but I was wondering if there is a canonical set that you use for testing on your end that might facilitate testing of 3rd party tools.

Thanks for the input!

Hi @gregcaporaso
Answering your questions:

  • MetaPhlAn 3 and MetaPhlAn 4 profiles can be differenciated by the database used in the analysis (fist row of the profile file), the current metaphlan 4 database is named mpa_vJan21_CHOCOPhlAnSGB_202103.
  • Unfortunately, I while we have some profiles available in the metaphlan 3 tutorial that we usually reuse for that purposes (metaphlan3 · biobakery/biobakery Wiki · GitHub) we are still working on that for version 4

Ok, thank you @aitor.blancomiguez!