The bioBakery help forum

Taxonomy for markers (-t marker_counts)


I have run MetaPhlAn 3 with -t marker_counts as I wanted to output read counts for each marker to do further statistical analysis such as phyloseq and deseq2.

This has given me an output of what I assume is each marker and the read count. Is there anyway to get the taxonomy of each marker?

I am also getting an error when trying to merge the tables together. Does only work with the rel_ab output?

 Traceback (most recent call last):
  File "/pub38/cchong/miniconda2/envs/metaphlan3/bin/", line 10, in <module>
  File "/pub38/cchong/miniconda2/envs/metaphlan3/lib/python3.7/site-packages/metaphlan/utils/", line 78, in main
    merge(args.aistms, sys.stdout)
  File "/pub38/cchong/miniconda2/envs/metaphlan3/lib/python3.7/site-packages/metaphlan/utils/", line 46, in merge
    names = names,
UnboundLocalError: local variable 'names' referenced before assignment

Thanks in advance!

Directly no, but you can use the mpa_v30_CHOCOPhlAn_201901_marker_info.txt.bz2 file in order to get the 'taxonomy' field associated to each marker.

Yes, only with rel_ab and rel_ab_w_read_stats

Hello Charlotte and Francesco,
I came across a similar problem: trying to translate markers when running metaphlan with the -t marker_counts option.
As suggested by Francesco I used the mpa_v30_CHOCOPhlAn_201901_marker_info.txt.bz2 for this. I created a small script and managed to obtain the format I wanted, these are the first lines (the second field is the read count (adding all the read count markers that contributed to a bug)):

k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Lachnospiraceae|g__Lachnospiraceae_unclassified|s__Eubacterium_rectale	58749
k__Bacteria|p__Proteobacteria|c__Betaproteobacteria|o__Burkholderiales|f__Burkholderiaceae|g__Cupriavidus|s__Cupriavidus_metallidurans	1
k__Bacteria|p__Synergistetes|c__Synergistia|o__Synergistales|f__Synergistaceae|g__Pyramidobacter|s__Pyramidobacter_piscolens	78
k__Bacteria|p__Verrucomicrobia|c__Verrucomicrobiae|o__Verrucomicrobiales|f__Akkermansiaceae|g__Akkermansia|s__Akkermansia_muciniphila	2712
k__Bacteria|p__Fusobacteria|c__Fusobacteriia|o__Fusobacteriales|f__Fusobacteriaceae|g__Fusobacterium|s__Fusobacterium_equinum	4
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Xanthomonadales|f__Xanthomonadaceae|g__Lysobacter|s__Lysobacter_enzymogenes	1
k__Bacteria|p__Fusobacteria|c__Fusobacteriia|o__Fusobacteriales|f__Leptotrichiaceae|g__Leptotrichia|s__Leptotrichia_sp_oral_taxon_225	1
k__Bacteria|p__Proteobacteria|c__Deltaproteobacteria|o__Desulfovibrionales|f__Desulfovibrionaceae|g__Bilophila|s__Bilophila_wadsworthia	1111
k__Bacteria|p__Proteobacteria|c__Deltaproteobacteria|o__Desulfovibrionales|f__Desulfovibrionaceae|g__Desulfovibrionaceae_unclassified|s__Desulfovibrionaceae_bacterium	3

Same thing as what Charlotte mentions, I want to use this table for further analysis, I’m thinking DESeq2 to perform a differential abundance analysis. However, I’m concerned I’m not taking into consideration thresholds such as percentage of markers with zero counts to discard an specie, among others.
Therefore, my question is: Is it valid to use the read counts table with the taxonomy info for differential abundance with DESeq2? or I’m beeing too naive?

Yes! After transformation you should consider the different read depth by adding an offset term in your model.

There’s also an example of the count conversion in the curatedMetagenomicData vignette here

Hello David,

Can you share your script?


But this is assuming that the read depth per sample is known. In many cases, this is not the case. There are several threads where people are asking about how to output counts rather than the normalised relative abundances, but none provide a clear response. Relative abundances are suboptimal as information is lost in the normalisation, and it is also not suitable for common analyses such as Bray-Curtis.