Linear model assumption validation

Shuqi · March 18, 2025, 11:49am

Hi,

Thank you so much for developing the great tool! I have a question regarding using the CPLM model recently in a Clinical microbiome data. I got feedbacks asking me to validate the linear model assumption and I wonder this is something can be done in MaAsLin. Could I get some advice please? I used MaAsLin2. v1.15.1.

Best,
S

WillNickols · March 18, 2025, 1:08pm

Hi,

Do you have any more information on what the reviewer meant by “validate the linear model assumption”? Without more information, it could mean a few things:

The mean of your feature abundance given the covariates is related to your (continuous) variable linearly: You can check for this by (1) plotting the residuals vs. fitted values to look for a trend (there should be none) or (2) plotting your (log transformed) feature abundance vs. covariate of interest and checking for a linear relationship.
The errors/residuals actually follow the CPLM distributional assumptions: This is pretty difficult to do properly, but you could just fit with different distributional assumptions (e.g., the recommended default: log normal) and see if your conclusions hold up.
Your observations are independent (or you’ve controlled for non-independence): Check that any natural grouping (e.g., per-subject) is controlled for with a random intercept.

If I had to guess, (2) seems like the most likely interpretation since compound Poisson isn’t a very common distribution for microbiome analysis. If it isn’t too difficult to change, I’d suggest using the default TSS normalization, log transformation, and base linear model, but hopefully you’ll get similar results either way.

Will

Shuqi · March 18, 2025, 2:17pm

Hi Will,

Thank you so much for your advice!

I actually did 4 models including log normal and the CPLM. And I just went back to check between the results of the two models in one of my regressions. CPLM gave more results than the log normal and also showed much meaningful qval, in other words, I filter the all.results.tsv by pval < 0.05 and looked at the qval. qvals of CPLM results were all below 0.25 and qval of log normal models were above 0.5. There are some overlaps between the two sets of results, but CPLM really allowed for more findings in disease vs health.

Besides, as far as I can recall, the reviewer were more cared about how does MaAsLin adjust p val into q val and then slightly mentioned to check the linear model assumption. I would think that a plot of residues vs fitted values could fit the need. But I’m open to any further feedbacks from the reviewer and I can keep you posted.

Thanks for your thorough input again.

Best,
S

WillNickols · March 18, 2025, 4:17pm

Based on that information, I would caution that—as reported in the MaAsLin 2 paper—CPLM tends to produce more false positives than the log normal model, and this is likely amplified further if your model is much more complicated than treatment vs. controls. If you’re seeing more significant q-values with CPLM than log normal (especially if they’re all significant in CPLM vs none in log normal), it would be worth plotting a few of the significant relationships to make sure they seem right. A p-value just tells you how unlikely your data were under then null, and if your null (e.g., under a compound Poisson rather than log normal) is very inconsistent with the data, you can still get a significant p-value, regardless of whether anything important is happening biologically.

Will

Topic		Replies	Views
Dont understand the the model parameters MaAsLin	1	279	October 2, 2023
How to choose the results found by different model and transformation/normalization methods? MaAsLin	2	616	June 10, 2023
The issue of comparing results after recalculating q-values in Maaslin2 MaAsLin	1	96	November 1, 2024
Metaphlan3 analysis with Maaslin2 MaAsLin	3	1873	February 11, 2021
Choosing analysis method for maaslin2 MaAsLin	10	4601	May 6, 2024

Linear model assumption validation

Related topics