How to choose the results found by different model and transformation/normalization methods?

Yang_Peng · December 28, 2022, 3:30am

Hello!

Background:
I used MaAsLin2 (version 1.10.0) to identify microbes significantly associated with the categorical variable (clinically effective, or no_effective).
I don’t know much about the mathematical principles of different statistical/regression models, so I tried every model (LM, CPLM, NEGBIN, ZINB) with each transformation and normalization methods.
The results of different models varied significantly (different transformation/normalization methods of the same model also could contribute to different results), for example, LM models with each transformation and normalization methods found no association, while CPLM, NEGBIN and ZINB models found a dozen of microbes associated with clinicla effect.

My question is:
Which model and transformation/normalization method are more reliable? How should we choose the varied microbes identified by different model and transformation/normalization method, such as those found consistently by several methods, or combining the biological significance of the research topic?

Sorry, my question is a little long.
Thanks for your help!

andrewGhazi · January 3, 2023, 9:31pm

Without additional information about the characteristics of your dataset, it’s hard to suggest anything other than the defaults. There’s ultimately no way to get around examining the properties of your data, learning about the statistical assumptions of the various methods, and ensuring that your analysis plan is appropriate. If you can’t discuss your experiment publicly, you may be able to get a consultation on your analysis with a statistician at your institution. Trying every parameter combination then reporting whatever gives the lowest significance value will almost certainly lead to a misleading result.

Yang_Peng · June 10, 2023, 9:27am

Hi, andrewGhazi! Thanks for your answer, and sorry for the late reply.

Additional information about my dataset:

The data I want to input into MaAsLin2 is the metagenome abundance-table data (human stool samples), gained from Kndaddata and metaphlan processing. There are 120 samples in two groups (responder to treatment and non-responder) and ~900 taxa at species level in total.

Research objective:
The associations between taxa abundance and the response to treatment; between taxa abundance and the clinical data (numeric type).

Thanks for your patience and guidance!

Topic		Replies	Views
Choosing analysis method for maaslin2 MaAsLin	10	5031	May 6, 2024
Metaphlan3 analysis with Maaslin2 MaAsLin	3	1963	February 11, 2021
Dont understand the the model parameters MaAsLin	1	319	October 2, 2023
How to define the transformation / the normalization to use in Maaslin2 MaAsLin	4	3022	May 26, 2021
Recommanded parameters when using Maaslin2 for Metagenomics data ranging from 0 to 1 MaAsLin	0	354	May 8, 2022

How to choose the results found by different model and transformation/normalization methods?

Related topics