I am trying to create a model to determine if categorical variables of interest (i.e., if patients had surgery, disease assessment by physicians, etc) are significant predictors of specific KOs found in our Humann3 output.
I know it’s recommended to use the total-sum scaled abundances as input into Maaslin3, then use a log transformation and an OLS regression. For some other analyses, I had transformed the KOs’ TSS abundances using the arcsine square root transformation. However, now that I am working on making a GLM, I am a bit stuck.
I am currently deciding between four options and would appreciate some insight:
- log transformed TSS abundances as input into OLS regression with categorical variables
- arcsine square root transformed, TSS abundances as input into OLS regression with categorical variables
- arcsine square root transformed, TSS abundances as input into GLM using the Gamma family (aka logit link function)
- log transformed TSS abundances as input into GLM using the Gamma family (aka logit link function)
After the arcinse square root transformation, some of my KOs of interest are not normally distributed, which is why I am a bit stuck on my methods here. Gamma seems like the most appropriate distribution to use because it can handle positive, continuous non-integer data.
If anyone has suggestions or references to share, that would be extremely helpful. Thank you so much!!