Species as Covariates and Clinical as Dependent Variable

In microbiome linear models, whether MaAsLin or its competitors, why do we never see models like clinicalVariable ~ species1 + species2 + species3 + species4 + ... + numberOfSpecies being built? If all of the species were fitted in a single model, it would also be feasible to avoid multiple testing p-value adjustment.

As always, its really a question of what you are after. You can of course fit species as covariates for a linear model. But if what you want is to “tag” , from a large number of features (species in this case) , which associate with a clinical variable, fitting them all to a single model is nowhere near feasible, or even theoretically sound. First off, there is way too many, even if you narrow them down somehow to representative species - ideally you want your model to have as few covariates as possible. Secondly, species are highly intercorrelated and interdependent - both for biological reasons (competition, symbiosis, similar phenotypes for phylogentically related species, etc.) and technical ones (for example data is usually compositional).

1 Like

Hi everyone,

As @leahfa already pointed out there are some good reasons why species are usually not the covariate.

In essence there is no reason why you couldn’t run a model the other way around (if you just had a single metadata and a single species) and you would get fairly similar results (although the fitting of the least squares to prioritize the prediction of the metadata rather than the species may end up with slightly different coefficients).

Thanks,
Jacob Nearing