How to interprete Maaslin output?

Hi- @sma @himel.mallick @Kelsey_Thompson I want to find differentially abundant bacteria associated with disease samples. My metadata is something like following:

ID Type Age Comorbidity Gender
SRP090252 disease 71.65 T2D Female
SRP090268 control 70.05 T2D Female
SRP090265 control 70.85 NGT Female
SRP090264 control 70.21 IGT Female
SRP090263 control 70.06 NGT male
SRP090258 disease 69.53 NGT Female
SRP090250 disease 70.02 IGT Female
SRP090244 disease 70.07 IGT male
SRP090242 control 69.78 NGT Female
SRP090227 control 69.74 NGT male
SRP090226 control 69.7 NGT Female
SRP090224 disease 69.67 NGT male
SRP090222 disease 71.4 IGT Female
SRP090217 control 71.53 NGT Female

Now, I want to get the differentially abundant bacteria associated with disease samples after adjusting for confounders Age, Comorbidity, and Gender. In this context, how should I specify the Random effect and Fixed effect options?

As much as I got to know from other posts in this forum, I think I should provide --fixed-effect "Type" and --random-effect "ID". Is this correct? Will this give me an output after adjusting for all the confounders I mentioned?
Or, should I provide --fixed effect "Type, Age, Comorbidity, Gender"? and --random effect "ID"?

If none of the above correct, please suggest me what should I follow?

many Thanks,
DC7

Hi @DEEPCHANDA7 - it’s the latter one based on your description. If you do not have repeated ID (which seems to be the case in your example), you don’t need a random effect.

All the best,
Himel

1 Like

Thanks a lot, sir @himel.mallick for your fast response. Sir, I have two follow up queries-

Yes, i don’t have a repeated ID. Do you mean that I should use ID as neither random effect nor fixed effect?

  1. I am having hard time interpreting my output. Here’s my code, metadata, and output:

Code:
library(Maaslin2)
library(readr)
setwd(“~/my_output”)
getwd()

input_data <- as.data.frame(read_tsv("merged_abundance_species_5.tsv"))
input_metadata <-as.data.frame(read_csv("metadata.csv"))
fit_data <- Maaslin2(
  input_data, input_metadata, 'my_output', transform = "NONE",
  fixed_effects = c('type', 'comorbidity', 'age'),
  reference = c('comorbidity,NGT'),
  normalization = 'NONE',
  min_abundance = 0.0,
  max_significance = 0.25,
  min_prevalence = 0.0,
  standardize = FALSE)

Metadata:

ID type age comorbidity
SR090136 disease 70.15 IGT
SR090140 disease 70.15 T2D
SR090147 control 71.39 NGT
SR090150 disease 71.58 IGT
SR090152 disease 71.24 T2D
SR090154 disease 71.04 IGT
SR090157 disease 70.96 IGT
SR090161 disease 70.39 T2D
SR090163 disease 70.14 NGT
SR090167 disease 70.97 T2D
SR090169 disease 68.96 T2D
SR090170 control 71.11 NGT
SR090171 control 70.55 NGT
SR090173 disease 70.11 T2D
SR090175 disease 71.02 NGT
SR090177 disease 70.2 IGT
SR090179 disease 70.16 T2D
SR090193 control 70.25 NGT
SR090205 control 70.63 NGT

Output:

significant_results.tsv

feature metadata value coef stderr N N.not.0 pval qval
Faecalitalea_cylindroides comorbidity IGT -0.377764650707978 0.043253216559351 19 5 4.86003286684536E-07 0.000209953419848
Faecalitalea_cylindroides comorbidity T2D -0.368187348493346 0.041715679009967 19 5 4.28479422114103E-07 0.000209953419848
Faecalitalea_cylindroides type disease 0.350656650707978 0.043253216559351 19 5 1.17270795116608E-06 0.000337739889936
Collinsella_stercoris comorbidity IGT -0.109236524922356 0.025087649856961 19 18 0.00066062321169 0.142694613724944
Roseburia_inulinivorans age age -1.58382545890012 0.385112459302547 19 18 0.001055772797088 0.15082184993556
Ruminococcus_bromii type disease 18.7382134402279 4.63755246673051 19 11 0.001215556053757 0.15082184993556
Ruminococcus_bromii comorbidity IGT -18.7258134402279 4.63755246673051 19 11 0.0012219362842 0.15082184993556
Olsenella_scatoligenes comorbidity IGT -0.014838219463708 0.003842430774736 19 8 0.001727334716805 0.178134499426023
Oscillibacter_sp_57_20 age age -0.310432279358586 0.081151146011419 19 11 0.001855567702354 0.178134499426023
Bifidobacterium_bifidum type disease 3.61449351479 1.13571553919717 19 8 0.006646997243247 0.205410860127932
Olsenella_scatoligenes type disease 0.012930219463708 0.003842430774737 19 8 0.004622838288644 0.205410860127932
Olsenella_scatoligenes comorbidity T2D -0.01242486374167 0.003705842514556 19 8 0.004737715789044 0.205410860127932
Collinsella_intestinalis comorbidity IGT -0.020799960994458 0.006581295428193 19 16 0.006945621531625 0.205410860127932
Collinsella_stercoris type disease 0.080588524922356 0.025087649856961 19 18 0.006265644670361 0.205410860127932
Collinsella_stercoris comorbidity T2D -0.079481101360263 0.024195850200212 19 18 0.005422718327775 0.205410860127932
Alistipes_onderdonkii comorbidity T2D -0.096692174486469 0.030877949038327 19 3 0.007358245481338 0.205410860127932
Clostridium_sp_CAG_411 type disease 0.015964789965694 0.005060519673996 19 1 0.007024676313767 0.205410860127932
Clostridium_sp_CAG_411 comorbidity IGT -0.015964789965694 0.005060519673996 19 1 0.007024676313768 0.205410860127932
Clostridium_sp_CAG_411 comorbidity T2D -0.017198735694422 0.004880631572321 19 1 0.003371414639798 0.205410860127932
Lawsonibacter_asaccharolyticus type disease 0.477862895042796 0.151766856842129 19 18 0.007110491107549 0.205410860127932
Lawsonibacter_asaccharolyticus comorbidity IGT -0.475124895042795 0.151766856842129 19 18 0.007370065583294 0.205410860127932
Eubacterium_eligens age age -1.2467673088303 0.391416017038458 19 17 0.006611343727095 0.205410860127932
Oscillibacter_sp_CAG_241 comorbidity IGT -2.64857972770573 0.763998086959495 19 17 0.003776853512783 0.205410860127932
Oscillibacter_sp_CAG_241 comorbidity T2D -2.61370728597232 0.736839973880289 19 17 0.003218875403587 0.205410860127932
Ruminococcaceae_bacterium_D16 comorbidity T2D -0.106504807911724 0.031556086714996 19 4 0.004531946745981 0.205410860127932
Ruminococcaceae_bacterium_D5 comorbidity IGT -0.158426557912691 0.050505479793602 19 4 0.007279877600031 0.205410860127932
Ruminococcaceae_bacterium_D5 comorbidity T2D -0.170497579432457 0.048710143450794 19 4 0.003533466593531 0.205410860127932
Ruminococcus_bromii comorbidity T2D -14.530958484276 4.4726997315575 19 11 0.005826462285798 0.205410860127932
Firmicutes_bacterium_CAG_534 type disease 0.001257422965504 0.000374853309185 19 1 0.004722065633165 0.205410860127932
Firmicutes_bacterium_CAG_534 comorbidity IGT -0.001257422965504 0.000374853309185 19 1 0.004722065633165 0.205410860127932
Firmicutes_bacterium_CAG_534 comorbidity T2D -0.001166019578191 0.000361528264616 19 1 0.006106036469878 0.205410860127932
Firmicutes_bacterium_CAG_95 age age -0.058657135094159 0.018943043595487 19 7 0.007887029446352 0.206496770959043
Phascolarctobacterium_succinatutens age age -0.282982879719054 0.091233077220782 19 2 0.00780509739456 0.206496770959043
Bifidobacterium_bifidum comorbidity IGT -3.49699551479 1.13571553919717 19 8 0.008164122157298 0.207464751291347
Bacteroides_ovatus age age -0.790359006180556 0.265922716314961 19 16 0.010093623789579 0.249168312977036

Here, I just want to get the bacteria that are significantly associated with the control and disease group after adjusted for confounders. What are those? Are they all in this significant_results.tsv output? Or, only those with denoted with disease under the value column?

thanks,
DC7

Hi @DEEPCHANDA7 - that’s correct. you don’t need ID in the model. You will be looking at the metadata column to filter this table by the main variable of interest (i.e. type).

Since your metadata variable (type) is binary, MaAsLin 2 internally models this as a dummy variable with type = control as the reference level (as a rule of thumb, the reference level usually does not appear in the value column).

In your case, the results indicate that the coefficients (and their signs) should be interpreted with type = control as the reference e.g. feature X is more abundant in type = disease as compared to type = control (if the corresponding coefficient is positive) and vice versa. I hope it makes sense.

For more information on how to interpret binary categorical variables in a regression setting, feel free to check out the tutorial here. Also the visualization plots should provide sufficient information to connect the dots.

All the best,
Himel

2 Likes