Hi all,
I posted a related question recently ( Human and metaphlan file formats), but @franzosa suggested that I check with MetaPhlAn folks under this topic.
I’ve been working on a QIIME 2 plugin to facilitate using HUMANn 3 and MetaPhlAn 3 output with downstream tools in QIIME 2 (taxonomy/functional category plots, ordination, etc). The MetaPhlAn format that I’m specifically focused on at the moment is the MetaPhlAn merged abundance table as illustrated here:
#mpa_v30_CHOCOPhlAn_201901
clade_name NCBI_tax_id sample1 sample_2
k__Archaea 2157 9.75907 0.02352
k__Archaea|p__Euryarchaeota 2157|28890 9.75907 0.02352
k__Archaea|p__Euryarchaeota|c__Methanobacteria 2157|28890|183925 9.75907 0.02352
k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales 2157|28890|183925|2158 9.75907 0.02352
k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae 2157|28890|183925|2158|2159 9.75907 0.02352
k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanobrevibacter 2157|28890|183925|2158|2159|2172 9.75907 0.02352
k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanobrevibacter|s__Methanobrevibacter_smithii 2157|28890|183925|2158|2159|2172|2173 9.75907 0.02352
k__Bacteria 2 90.24093 99.97648
k__Bacteria|p__Actinobacteria 2|201174 90.24093 99.97648
k__Bacteria|p__Actinobacteria|c__Actinobacteria 2|201174|1760 90.24093 99.97648
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales 2|201174|1760|2037 90.24093 99.97648
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae 2|201174|1760|2037|2049 90.24093 99.97648
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinobaculum 2|201174|1760|2037|2049|76833 45.0 10.97648
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinobaculum|s__Actinobaculum_sp_oral_taxon_183 2|201174|1760|2037|2049|76833|712888 45.0 10.97648
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces 2|201174|1760|2037|2049|1654 45.24093 89
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_graevenitzii 2|201174|1760|2037|2049|1654|55565 5.0 0.0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_naeslundii 2|201174|1760|2037|2049|1654|1655 1.24093 0.0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_odontolyticus 2|201174|1760|2037|2049|1654|1660 25.0 0.0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_oris 2|201174|1760|2037|2049|1654|544580 10.0 9.0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_sp_HMSC035G02 2|201174|1760|2037|2049|1654|1739406 4.0 80.0
I have a couple of questions related to this work.
First, it sounds like there may be differences in that file format between MetaPhlAn 3 and 4 - is that correct, and if so is there a way to programmatically determine which version of the file I’m working with to ensure accurate parsing?
Second, are there example output files for different versions of the file format around anywhere that I could use in my testing? I’ve pulled a few from my own work and from the docs, but I was wondering if there is a canonical set that you use for testing on your end that might facilitate testing of 3rd party tools.
Thanks for the input!