MaasLin2: Heatmap Vs Significance

Hi, I am using Masslin2 to identify taxa that are associated with specific clinical covariates (metadata) for a longitudinal dataset. I have used the patient ID as random effects and the clinical covariates as fixed effects. The tool is easy to use, and I had a few questions on the interpretation/usage of it:

  1. Is it best to consider one metadata at a time or to consider them all together in the fixed effects, what is the difference between the two approaches?

  2. For a categorical feature for ex. gender, if the reference gender is Male, does a positive coefficient with value of Female in significance file mean that particular taxa is positively associated with Female Vs Male; and vice versa if negative?

  3. If one considers multiple covariates as fixed effects, I am not sure how to understand the heatmap and significance results table? For ex. I have 4 clinical sites and Gender, and the heatmap is showing positive association for 2 of the sites and Gender, but the significance results table only has one value for Site 1, how does one interpret that?

Thanks in advance!

Hi @arti.tandon,
To answer your questions.

  1. MaAsLin adjusts the model for all covariates run. So if you run one variable at a time the other variables that you interested in will not be adjusted for during the run. This can be good or bad. If you have two variables that are highly confounded it can cause problems with running the model. However, if you have clinical covariates that may explain significant variation independently of your variable (or variables) of interest e.g. age, diet etc. including those in the model can increase your confidence in your results. I hope that helps you decide!
  2. Yes, I believe your interruption here is correct.
  3. Without a minimally reproducible example dataset, I would have problems answering this one. Since the heatmap is based on the significant results tsv files, it should include everything in that heatmap. (with the caveat of it caps at 50 significant features while plotting).

I hope that helps! I am happy to test the issue you are seeing in 3 if you can upload a dummy (or real dataset) that replicates the error you are seeing.

Best,
Kelsey

1 Like

Thanks Kelsey for your responses.

As per #3, maybe I was not clear with my question. In the case when I consider multiple covariates, ex. gender, age and BMI for modeling. As per the output the heatmap gives me say a particular species has positive significant association both with gender and age (the first row in the image attached); but in the significance results table in the metadata column it only lists gender and associated P-value. How do I understand this P-value, is it adjusted for age? How do I interpret this result, is it that both gender and age are associated significantly but gender has a higher association?

Example output:
Image:
species_maaslin2

Table:
s_XX;Gender; Male;3.0;0.001

Thanks,
Arti

@arti.tandon - first of all, the heatmap and the table of significant associations should correspond to each other as the heatmap is a subset of all significant associations in the analysis, as @Kelsey_Thompson pointed out. It’s extremely unusual to see an association in the heatmap but not in the table of significant associations (do let us know if you see this unusual behavior by means of a reproducible example so that we can test it on our end).

Regarding interpretation of your results, when you have multiple covariates in the model (e.g. Microbe Abundance ~ X + Y + Z), the interpretation goes as follows (assuming the corresponding coefficient for Z is significant following FDR correction):

‘After adjusting for X and Y, Z is positively or negatively associated (depending on the sign of the coefficient) with the outcome (i.e. feature abundance).’

I hope this is clear.

Best,
Himel

1 Like

Hi Himel

Thanks for the response. You are correct, I can see the associations for all in the table and heatmap.

As per interpretation of results, if both Y and Z are associated with the microbe abundance, means it is post adjusting for all the others, i.e. X and Y in case of Z and X and Z in case of Y?

Thanks

@arti.tandon - Yes, that’s correct.

Hello @himel.mallick ,

is the order of covariates crucial? So, for example I am interested in clinical variable Z. I want to adjust for confounding factors (e. g. sequencing_depth and treatment). Should my model then be as follows?

Microbe_abundance ~ sequencing_depth + treatment + Z

which would be the following Maaslin formula:

fixed_effects = c(“treatment”, “seq_depth”, clinical_variable")

Moreover, I want to adjust for differences due to individual patients (patient-id), which I set as random effects. Would it be appropriate to set my other confounding factors as random effects as well? Or should i specify them in the fixed effects term (together with my clinical variable of interest)

My current Maaslin call is as follows. Is that reasonable?

fit_data = Maaslin2(feature_table, metadata, random_effects = c(“patient-id”, “seq_depth”), fixed_effects = c(“clinical_variable”), reference = c(“clinical_variable,A”))

I wonder if you already had the time to go through this question @himel.mallick

Hi @plicht - apologies for the unintentional delay - the order of covariates does not matter in MaAsLin 2. Your model formulation based on your description above should be the following:

fit_data = Maaslin2(feature_table, metadata, random_effects = c(“patient-id”), fixed_effects = c(“clinical_variable”, "seq_depth", "treatment"), reference = c(“clinical_variable,A”)).

Hope this helps!