How to choose the results found by different model and transformation/normalization methods?

Hello!

Background:
I used MaAsLin2 (version 1.10.0) to identify microbes significantly associated with the categorical variable (clinically effective, or no_effective).
I don’t know much about the mathematical principles of different statistical/regression models, so I tried every model (LM, CPLM, NEGBIN, ZINB) with each transformation and normalization methods.
The results of different models varied significantly (different transformation/normalization methods of the same model also could contribute to different results), for example, LM models with each transformation and normalization methods found no association, while CPLM, NEGBIN and ZINB models found a dozen of microbes associated with clinicla effect.

My question is:
Which model and transformation/normalization method are more reliable? How should we choose the varied microbes identified by different model and transformation/normalization method, such as those found consistently by several methods, or combining the biological significance of the research topic?

Sorry, my question is a little long.
Thanks for your help!

Without additional information about the characteristics of your dataset, it’s hard to suggest anything other than the defaults. There’s ultimately no way to get around examining the properties of your data, learning about the statistical assumptions of the various methods, and ensuring that your analysis plan is appropriate. If you can’t discuss your experiment publicly, you may be able to get a consultation on your analysis with a statistician at your institution. Trying every parameter combination then reporting whatever gives the lowest significance value will almost certainly lead to a misleading result.

Hi, andrewGhazi! Thanks for your answer, and sorry for the late reply.

Additional information about my dataset:

The data I want to input into MaAsLin2 is the metagenome abundance-table data (human stool samples), gained from Kndaddata and metaphlan processing. There are 120 samples in two groups (responder to treatment and non-responder) and ~900 taxa at species level in total.

Research objective:
The associations between taxa abundance and the response to treatment; between taxa abundance and the clinical data (numeric type).

Thanks for your patience and guidance!