I have a question regarding the effect of changing the reference group in a categorical variable.
I am analyzing three groups: right, left, and rectum, using Maaslin3 with:
- fixed effect of interest: tissue location (
right,left,rectum) - covariates:
sex,BMI, andage - model:
abundance - significance threshold:
qval_individual < 0.1
As I understand it, Maaslin3 uses the first factor level as the reference group.
When I set right as the reference level, Maaslin3 compares:
left vs rightrectum vs right
When I set rectum as the reference level, Maaslin3 compares:
right vs rectumleft vs rectum
However, the number of significant differential features detected for the right vs rectum comparison is different between these two runs, even though biologically this should represent the same pairwise comparison.
For example:
- using
rightas reference gives results forrectum vs right - using
rectumas reference gives results forright vs rectum
but the significant feature counts are not identical.
Is this expected behavior in Maaslin3?
If so, could you explain why changing the reference level changes the number of detected features for what appears to be the same pairwise comparison?
I am wondering whether this could be related to:
- model parameterization or contrast coding,
- how
qval_individualis calculated, - multiple testing correction being applied separately to different coefficient sets,
- interaction with covariates (
sex,BMI,age), - prevalence/filtering procedures,
- or some other aspect of the Maaslin3 implementation.
Thank you very much for your help.