Taxon relative abundances from gene families file?


Is it possible to construct a taxa abundances table from the gene families output of HUMANn? That is, if we sum over the species-assigned gene family abundances for each sample and then renormalize, do we get the same relative abundances that we would by using MetaPhlan?

My apologies if this question has been asked before - I spent some time searching, but may not have used the right keywords.

Thanks for your time!
(PS – I’m using HUMANn3.)

If you were to sum species’ stratified gene family abundances and then sum-normalize over those values you would get something similar to MetaPhlAn’s abundance profile. However, that approach would not account for differences in genome size, so you’d tend to overestimate the copy number of species contributing more genes to the community.

Is there a reason you’re thinking about this approach as opposed to just using the MetaPhlAn abundances for the sample (which you can find under the sample’s temp output directory)?


Thank you very much for your quick and helpful reply! I am trying to help a colleague with an analysis. The temp output MetaPhlAn file that you described is exactly what we want, so I will see if we still have those files.

Hi @franzosa

I have a follow-up question. I was checking the manual of MetaPhlAn (Home · biobakery/MetaPhlAn Wiki · GitHub). It mentions that “The relative abundance profile is scaled according to the percentage of reads mapping to a known clade”.
Does this mean that we get clade-level microbial abundance OR OTUs classified to clade-level abundance?
Looking forward to your response!