Differential pathway abundance analysis by Masslin3

Hi,

I have pathway abundance outputs from Humann 3.8 and can normalize the data into CPM (copies/counts per millions) and relative abundance. I have three variables in my metadata and want to compare the pathway abundance between multiple groups divided based on my metadata. Also the samples are repeated measures from the same participants. I think Masslin3 can handle the multiple fixed effects (including interaction terms) and random effects well. So I want to do the differential pathway abundance in Masslin3. My questions are:

  1. If normalizing data into relative abundance, Masslin3 can handle this compositional data. Then how about CPM? Can I run CPM data by Masslin3? If so, do I need to set additional normalization or transformation for my CPM data?
  2. Wonder if anyone has an idea about which data format is optimal to be used for differential pathway abundance analysis, CPM or relative abundance of pathways?

Thank you!

CPM is relative abundance times 1M, so you can either use NONE as the normalization or the default TSS, and you should get the same results. LOG transformation should be applied either way (the default).

CPM (genes) or pathway abundances would both be valid inputs. There might be many genes though (especially if you are using them stratified by species), so the pathways might help you improve the runtime and interpretability.

Will