Dear bioBakery friends,
Assuming the pathways were constructed from common shotgun metagenomic procedures, do pathways generated from HUMAnN2 maintain the compositional nature of the data? By “compositional,” I mean that the abundance of any 1 pathway is only interpretable relative to another and there is a sum constraint. Genes abundances are compositional but I am not sure for the pathways. Knowing whether the pathways abundances are compositional or not will infer what transformation/normalization and statistical methods I could use. If the pathways have the one-to-many problem (1 gene/read maps to multiple pathways in a community), then the compositional nature of the data is violated and compositional data analysis methods cannot be applied. Furthermore, if there is the one-to-many problem, I dont see how its valid to normlaize the pathways abundances using a method that is based on using the pathway abundances as a reference (e.g., TSS normalization) and instead, should use a reference that is compositional in nature (for example, if using TSS use the total sum of the gene abundances or perhaps even better, the total sum of the number of organisms as a reference) - the normalization I’m talking about is to removes technical bias related to different library sizes. But from reading the HUMAnN2 paper, it seems that the one-to-many problem is solved. It might be important to note that I plan to assess pathways as a community as a whole. I am not a statistician and my thought process could be absolutely wrong. Please advise.