Combining Metaphlan3 and Metaphlan4 outputs for Virus and Bacteria


I’m interested in using the updated Metaphlan4 analysis for bacteria profiling and want to supplement it with virus abundance estimates from Metaphlan3 while I’m waiting for the 4.1 release.

Tempted to add the virus relative abundances from Metaphlan3 to the Metaphlan4 output but unsure if this is appropriate. Would estimated clade counts be more appropriate to combine? I’ve read the discussion on relative abundance and understand there are subtleties with the calculation of relative abundance from clade counts. Just wondering how best to approach this.

I’m currently using Metaphlan4 to obtain both outputs (v3 and v4) using the mpa_v31_CHOCOPhlAn_201901, and mpa_vJan21_CHOCOPhlAnSGB_202103 databases with the --unclassified_estimation --add_viruses --mpa3 and --index options.


Hi @alexvcrr
As the relative abundance depends on the absolute abundance of any other clade in the metagenome, directly adding the relative abundances from mpa3 to mpa4 profile will not be appropiate and might led to missleading results