The bioBakery help forum

Setting reference levels with 2 factors

Hi !

I am discovering the MaAsLin2 package.

I am doing a comparison of the metagenome of healthy volunteers (named CT) versus patients (named AML) using R and the latest release version of Maaslin2 (updated this morning). I would like to have the plots with the CT placed in the first column and the AML placed in the second column.

I followed the instructions found here.
https://huttenhower.sph.harvard.edu/maaslin/
Setting reference levels

But despite having the factors set at the right level, the output is still not correct.

Here are my R lines

mapping_T0$Group_ordered = factor(mapping_T0$Group, levels=c(“CT”,“AML”))

fit_data = Maaslin2(input_data = data.filter2, input_metadata = mapping_T0, output = “Maaslin2_with_filters”, fixed_effects = c(“Group_ordered”))

str(mapping_T0$Group_ordered)
Factor w/ 2 levels “CT”,“AML”: 2 2 2 2 2 2 2 2 2 2 …

I saw the reference option here GitHub - biobakery/Maaslin2: MaAsLin2: Microbiome Multivariate Association with Linear Models and here Reference values for a categorical fixed effect variable but it seems to work only with factors of more than 2 levels.

I looked at similar posts, and didn’t find a solution. I hope I didn’t miss it.

Thanks for your input !
Looking forward to learning more about the package

Hi @laure_bindels,

You are correct on the behavior of MaAsLin a variable with only two reference levels will not allow for the reference to be set and will ignore the set ordered factor and instead use an alphabetical application. We have done this since the numbers will not change if you swap the reference level in this case, only the sign will change. To test this you could do something like add “a_” in front of your CT group, which should force that group to be the reference level.

Sorry if this is confusing!! I will make a note for a future version of the package to allow references even with 2 level factors.

Best,
Kelsey

Dear Kelsey,

Thanks for your swift reply. I was interested to have the graphes ready to use for presentations and publications as I saw in the documentation that data points are plotted after normalization, filtering, and transform, and I didn’t know how to retrieve these values. I will proceed as adviced and edit labels later on with an overlay.

Many thanks again for your answer,
Best,

Laure

Dear Kelsey,

I saw in the documentation that data points are plotted after normalization, filtering, and transform. Is there a way to retrieve these values ? powerpoint editing will not work to assign the colors and other characterictics of the other graphs I generated in R for this cohort. :smiley:

Many thanks for your help !
Best,

Laure

Hi Laure - I just checked to confirm and the scatter/box plots are made on the raw (filtered data), e.g. those plots are made before MaAsLin does normalization and transformation. Thus, if you want to replicate you can simply take the row/column from your feature table and replicate the plot in ggplot2 geom_point or ggplot2 geom_boxplot.

I hope this helps!

Best,
Kelsey

Hi Kesley,

Thanks for your reply and checking this. I ran Maaslin2 with lm and cplm (all parameters are the same, except transform = LOG and NONE respectively) and the plots are not identical.

Any idea of what may go wrong ?

Thanks for your help !

Laure