Hi- @sma @himel.mallick @Kelsey_Thompson I want to find differentially abundant bacteria associated with disease samples. My metadata is something like following:
ID |
Type |
Age |
Comorbidity |
Gender |
SRP090252 |
disease |
71.65 |
T2D |
Female |
SRP090268 |
control |
70.05 |
T2D |
Female |
SRP090265 |
control |
70.85 |
NGT |
Female |
SRP090264 |
control |
70.21 |
IGT |
Female |
SRP090263 |
control |
70.06 |
NGT |
male |
SRP090258 |
disease |
69.53 |
NGT |
Female |
SRP090250 |
disease |
70.02 |
IGT |
Female |
SRP090244 |
disease |
70.07 |
IGT |
male |
SRP090242 |
control |
69.78 |
NGT |
Female |
SRP090227 |
control |
69.74 |
NGT |
male |
SRP090226 |
control |
69.7 |
NGT |
Female |
SRP090224 |
disease |
69.67 |
NGT |
male |
SRP090222 |
disease |
71.4 |
IGT |
Female |
SRP090217 |
control |
71.53 |
NGT |
Female |
Now, I want to get the differentially abundant bacteria associated with disease
samples after adjusting for confounders Age
, Comorbidity
, and Gender
. In this context, how should I specify the Random effect
and Fixed effect
options?
As much as I got to know from other posts in this forum, I think I should provide --fixed-effect "Type"
and --random-effect "ID"
. Is this correct? Will this give me an output after adjusting for all the confounders I mentioned?
Or, should I provide --fixed effect "Type, Age, Comorbidity, Gender"
? and --random effect "ID"?
If none of the above correct, please suggest me what should I follow?
many Thanks,
DC7
Hi @DEEPCHANDA7 - it’s the latter one based on your description. If you do not have repeated ID
(which seems to be the case in your example), you don’t need a random effect.
All the best,
Himel
1 Like
Thanks a lot, sir @himel.mallick for your fast response. Sir, I have two follow up queries-
Yes, i don’t have a repeated ID
. Do you mean that I should use ID as neither random effect
nor fixed effect
?
- I am having hard time interpreting my output. Here’s my code, metadata, and output:
Code:
library(Maaslin2)
library(readr)
setwd(“~/my_output”)
getwd()
input_data <- as.data.frame(read_tsv("merged_abundance_species_5.tsv"))
input_metadata <-as.data.frame(read_csv("metadata.csv"))
fit_data <- Maaslin2(
input_data, input_metadata, 'my_output', transform = "NONE",
fixed_effects = c('type', 'comorbidity', 'age'),
reference = c('comorbidity,NGT'),
normalization = 'NONE',
min_abundance = 0.0,
max_significance = 0.25,
min_prevalence = 0.0,
standardize = FALSE)
Metadata:
ID |
type |
age |
comorbidity |
SR090136 |
disease |
70.15 |
IGT |
SR090140 |
disease |
70.15 |
T2D |
SR090147 |
control |
71.39 |
NGT |
SR090150 |
disease |
71.58 |
IGT |
SR090152 |
disease |
71.24 |
T2D |
SR090154 |
disease |
71.04 |
IGT |
SR090157 |
disease |
70.96 |
IGT |
SR090161 |
disease |
70.39 |
T2D |
SR090163 |
disease |
70.14 |
NGT |
SR090167 |
disease |
70.97 |
T2D |
SR090169 |
disease |
68.96 |
T2D |
SR090170 |
control |
71.11 |
NGT |
SR090171 |
control |
70.55 |
NGT |
SR090173 |
disease |
70.11 |
T2D |
SR090175 |
disease |
71.02 |
NGT |
SR090177 |
disease |
70.2 |
IGT |
SR090179 |
disease |
70.16 |
T2D |
SR090193 |
control |
70.25 |
NGT |
SR090205 |
control |
70.63 |
NGT |
Output:
significant_results.tsv
feature |
metadata |
value |
coef |
stderr |
N |
N.not.0 |
pval |
qval |
Faecalitalea_cylindroides |
comorbidity |
IGT |
-0.377764650707978 |
0.043253216559351 |
19 |
5 |
4.86003286684536E-07 |
0.000209953419848 |
Faecalitalea_cylindroides |
comorbidity |
T2D |
-0.368187348493346 |
0.041715679009967 |
19 |
5 |
4.28479422114103E-07 |
0.000209953419848 |
Faecalitalea_cylindroides |
type |
disease |
0.350656650707978 |
0.043253216559351 |
19 |
5 |
1.17270795116608E-06 |
0.000337739889936 |
Collinsella_stercoris |
comorbidity |
IGT |
-0.109236524922356 |
0.025087649856961 |
19 |
18 |
0.00066062321169 |
0.142694613724944 |
Roseburia_inulinivorans |
age |
age |
-1.58382545890012 |
0.385112459302547 |
19 |
18 |
0.001055772797088 |
0.15082184993556 |
Ruminococcus_bromii |
type |
disease |
18.7382134402279 |
4.63755246673051 |
19 |
11 |
0.001215556053757 |
0.15082184993556 |
Ruminococcus_bromii |
comorbidity |
IGT |
-18.7258134402279 |
4.63755246673051 |
19 |
11 |
0.0012219362842 |
0.15082184993556 |
Olsenella_scatoligenes |
comorbidity |
IGT |
-0.014838219463708 |
0.003842430774736 |
19 |
8 |
0.001727334716805 |
0.178134499426023 |
Oscillibacter_sp_57_20 |
age |
age |
-0.310432279358586 |
0.081151146011419 |
19 |
11 |
0.001855567702354 |
0.178134499426023 |
Bifidobacterium_bifidum |
type |
disease |
3.61449351479 |
1.13571553919717 |
19 |
8 |
0.006646997243247 |
0.205410860127932 |
Olsenella_scatoligenes |
type |
disease |
0.012930219463708 |
0.003842430774737 |
19 |
8 |
0.004622838288644 |
0.205410860127932 |
Olsenella_scatoligenes |
comorbidity |
T2D |
-0.01242486374167 |
0.003705842514556 |
19 |
8 |
0.004737715789044 |
0.205410860127932 |
Collinsella_intestinalis |
comorbidity |
IGT |
-0.020799960994458 |
0.006581295428193 |
19 |
16 |
0.006945621531625 |
0.205410860127932 |
Collinsella_stercoris |
type |
disease |
0.080588524922356 |
0.025087649856961 |
19 |
18 |
0.006265644670361 |
0.205410860127932 |
Collinsella_stercoris |
comorbidity |
T2D |
-0.079481101360263 |
0.024195850200212 |
19 |
18 |
0.005422718327775 |
0.205410860127932 |
Alistipes_onderdonkii |
comorbidity |
T2D |
-0.096692174486469 |
0.030877949038327 |
19 |
3 |
0.007358245481338 |
0.205410860127932 |
Clostridium_sp_CAG_411 |
type |
disease |
0.015964789965694 |
0.005060519673996 |
19 |
1 |
0.007024676313767 |
0.205410860127932 |
Clostridium_sp_CAG_411 |
comorbidity |
IGT |
-0.015964789965694 |
0.005060519673996 |
19 |
1 |
0.007024676313768 |
0.205410860127932 |
Clostridium_sp_CAG_411 |
comorbidity |
T2D |
-0.017198735694422 |
0.004880631572321 |
19 |
1 |
0.003371414639798 |
0.205410860127932 |
Lawsonibacter_asaccharolyticus |
type |
disease |
0.477862895042796 |
0.151766856842129 |
19 |
18 |
0.007110491107549 |
0.205410860127932 |
Lawsonibacter_asaccharolyticus |
comorbidity |
IGT |
-0.475124895042795 |
0.151766856842129 |
19 |
18 |
0.007370065583294 |
0.205410860127932 |
Eubacterium_eligens |
age |
age |
-1.2467673088303 |
0.391416017038458 |
19 |
17 |
0.006611343727095 |
0.205410860127932 |
Oscillibacter_sp_CAG_241 |
comorbidity |
IGT |
-2.64857972770573 |
0.763998086959495 |
19 |
17 |
0.003776853512783 |
0.205410860127932 |
Oscillibacter_sp_CAG_241 |
comorbidity |
T2D |
-2.61370728597232 |
0.736839973880289 |
19 |
17 |
0.003218875403587 |
0.205410860127932 |
Ruminococcaceae_bacterium_D16 |
comorbidity |
T2D |
-0.106504807911724 |
0.031556086714996 |
19 |
4 |
0.004531946745981 |
0.205410860127932 |
Ruminococcaceae_bacterium_D5 |
comorbidity |
IGT |
-0.158426557912691 |
0.050505479793602 |
19 |
4 |
0.007279877600031 |
0.205410860127932 |
Ruminococcaceae_bacterium_D5 |
comorbidity |
T2D |
-0.170497579432457 |
0.048710143450794 |
19 |
4 |
0.003533466593531 |
0.205410860127932 |
Ruminococcus_bromii |
comorbidity |
T2D |
-14.530958484276 |
4.4726997315575 |
19 |
11 |
0.005826462285798 |
0.205410860127932 |
Firmicutes_bacterium_CAG_534 |
type |
disease |
0.001257422965504 |
0.000374853309185 |
19 |
1 |
0.004722065633165 |
0.205410860127932 |
Firmicutes_bacterium_CAG_534 |
comorbidity |
IGT |
-0.001257422965504 |
0.000374853309185 |
19 |
1 |
0.004722065633165 |
0.205410860127932 |
Firmicutes_bacterium_CAG_534 |
comorbidity |
T2D |
-0.001166019578191 |
0.000361528264616 |
19 |
1 |
0.006106036469878 |
0.205410860127932 |
Firmicutes_bacterium_CAG_95 |
age |
age |
-0.058657135094159 |
0.018943043595487 |
19 |
7 |
0.007887029446352 |
0.206496770959043 |
Phascolarctobacterium_succinatutens |
age |
age |
-0.282982879719054 |
0.091233077220782 |
19 |
2 |
0.00780509739456 |
0.206496770959043 |
Bifidobacterium_bifidum |
comorbidity |
IGT |
-3.49699551479 |
1.13571553919717 |
19 |
8 |
0.008164122157298 |
0.207464751291347 |
Bacteroides_ovatus |
age |
age |
-0.790359006180556 |
0.265922716314961 |
19 |
16 |
0.010093623789579 |
0.249168312977036 |
Here, I just want to get the bacteria that are significantly associated with the control
and disease
group after adjusted for confounders. What are those? Are they all in this significant_results.tsv
output? Or, only those with denoted with disease
under the value column?
thanks,
DC7
Hi @DEEPCHANDA7 - that’s correct. you don’t need ID
in the model. You will be looking at the metadata
column to filter this table by the main variable of interest (i.e. type
).
Since your metadata variable (type
) is binary, MaAsLin 2 internally models this as a dummy variable with type = control
as the reference level (as a rule of thumb, the reference level usually does not appear in the value
column).
In your case, the results indicate that the coefficients (and their signs) should be interpreted with type = control
as the reference e.g. feature X is more abundant in type = disease
as compared to type = control
(if the corresponding coefficient is positive) and vice versa. I hope it makes sense.
For more information on how to interpret binary categorical variables in a regression setting, feel free to check out the tutorial here. Also the visualization plots should provide sufficient information to connect the dots.
All the best,
Himel
2 Likes