Inverting reference

Hi,

I have a question about the effect of inverting the reference for variables, when there are only 2 values per variable.
My model is: ~ Treatment * Group + (1|Individual), with Treatment = either “Control” or “Treatment”, and Group = either “R” or “NR”

When I run Maaslin3 using this model, I get a bunch of results for each variable separately as well as for the interaction.
However, when I run the same model but changing the reference value for the “Group” variable from “NR” to “R”, I get the following:

•⁠ ⁠"Group": almost perfectly mirrored: if taxa X had a coef value of -2 with reference set as “NR”, it is now +2
•⁠ ⁠"Treatment": I get much less significant results (19, compared to 21 with reference set as “NR”), and the coefficient/p-values are different
•⁠ ⁠"Group*Treatment": also almost perfectly mirrored, except a few taxa that are at the limit of significance (p-values changed slightly)

I am a bit confused why the “Treatment” is so radically different.
For instance, some taxa are highly different (with high statistical significance) with one reference, but completely absent with the other.
I thought it would be mirrored like the other variables, or if not, then the interaction would be different as well, as the variable + interaction should sum up the effect.
Am I missing something?
How should I interpret changing the reference?

Here are both Maaslin3 code for reference:


m3_NR_ref <- maaslin3(input_data = input_maaslin,
                    input_metadata = metadata_maaslin,
                    output = "m3_NR_ref",
                    formula = '~ Treatment * Group + (1|Individual)',
                    reference = ('Treatment,Control;Group,NR’),
                    normalization = 'TSS',
                    transform = 'LOG',
                    augment = TRUE,
                    standardize = TRUE,
                    max_significance = 0.25,
                    median_comparison_abundance = TRUE,
                    median_comparison_prevalence = FALSE)
mR_R_ref <- maaslin3(input_data = input_maaslin,
                    input_metadata = metadata_maaslin,
                    output = "m3_R_ref",
                    formula = '~ Treatment * Group + (1|Individual)',
                    reference = (’Treatment,Control;Group,R’),
                    normalization = 'TSS',
                    transform = 'LOG',
                    augment = TRUE,
                    standardize = TRUE,
                    max_significance = 0.25,
                    median_comparison_abundance = TRUE,
                    median_comparison_prevalence = FALSE)

Thanks a lot for your time!

Hi,

Just to clarify a few things:

  1. Is your Treatment variable either Control or Treatment and your Group is either R or NR? I ask because in this line

However, when I run the same model but changing the reference value for the “Treatment” variable from “NR” to “R”, I get the following:

it looks like you’re saying Treatment could be NR or R.

  1. Are you intending the 2 references to be Treatment,Control;Group,R and Treatment,Control;Group,NR or Treatment,Control;Group,R and Treatment,Treatment;Group,NR? I ask because it looks like the 2 MaAsLin commands you provided use the same reference string.

Will

Hi,

Sorry for the mistakes, my bad!
I just edited my original post to correct and clarify:

  1. Treatment is indeed Control or Treatment, and Group is either R or NR.
    I made a mistake in the sentence you highlighted.

  2. One of my model is indeed Treatment,Control;Group,R and the other Treatment,Control;Group,NR

Hopefully it’s clear now! Let me know otherwise.

Got it - no worries. When you have an interaction model like this, there are 4 groups you’re comparing: Control-R, Control-NR, Treatment-R, and Treatment-NR. When you set R as the reference, your coefficients will be the average group differences:

  • Group: Control-NR - Control-R
  • Treatment: Treatment-R - Control-R
  • Group:Treatment: (Treatment-NR - Control-NR) - (Treatment-R - Control-R)

When you set NR as the reference, your coefficients will be the average group differences:

  • Group: Control-R - Control-NR
  • Treatment: Treatment-NR - Control-NR
  • Group:Treatment: (Treatment-R - Control-R) - (Treatment-NR - Control-NR)

As you can see in the math, the Group coefficients are just the negatives of each other depending on which reference you use. Likewise, the Group:Treatment coefficients are just each other’s negatives depending on the reference. However, the Treatment coefficient is different because the first is comparing treatment vs. control among the R group whereas the second is comparing treatment vs. control among the NR group.

Will

Thanks a ton Will!
It is very clear now, I understand the different outcomes.

Now, this makes me wonder how these results should be presented, as both Treatment-R - Control-R and Treatment-NR - Control-NR are relevant comparisons to explore the dataset?
If I only use one model with Group set with NR as reference for instance, and give the results for Group, Treatment, and Group:Treatment, I feel like I will be “missing” information given by the Treatment-R - Control-R comparison done when setting the reference as R.

Would it be OK to include this additional comparison alongside the 3 others, even if it is generated using a different model? What’s the best way to give a comprehensive overview of the results?

Even with NR set as the reference, you can get Treatment-R - Control-R by adding the Treatment and Group:Treatment coefficients (which you can see from the math above). Alternatively, you could just refit the model but with R as the reference and it’d be the same thing.

Arguably, if your readers understand what an interaction term is, it’s better to report the interaction term than both (Treatment-R - Control-R) and (Treatment-NR - Control-NR) separately. If the interaction is not statistically significantly different from 0, you immediately know that the R and NR groups were not affected by treatment differently, whereas if it’s non-zero, R versus NR is relevant to what sort of effect Treatment has. By contrast, if you just know (Treatment-R - Control-R) and (Treatment-NR - Control-NR) separately, you don’t know whether they’re statistically significantly different from each other.

Will

Thanks Will, that’s a lot clearer now!