Hi there!!!
My data is univariate like the following. I want to find out taxa associated with control and disease groups:
SampleID
type
SRR45656
control
SRR98989
disease
SRR78787
control
SRR45679
disease
I have run MetaPhlAn output with both Lefse (from galaxy) and Maaslin. I have used the command: $ Maaslin2.R --transform=NONE --fixed_effects="type,subject" --normalization=NONE --standardize=FALSE cleaned_file_trans.tsv metadata.tsv /media/deep/New\ Volume/obesity/done/prjeb_7854_wgs_analysed/maaslin2/
Is my command correct?
I have got total 35 significant results (P<0.05) (6 disease enriched, 29 control enriched) when used LefSe. But, no significant result with MaasLin2 with the above command. When I used --transform=LOG, I got only 5 significant results. They are common with the LefSe output and all from control enriched group. Why such discrepancy I am seeing in the result? Should I stop using Maaslin and only go with LefSe or am I doing some mistake with my commands?
Please help.
Hi @DEEPCHANDA7 - it looks like you are supplying subject as a fixed effect to the MaAsLin 2 call, which is quite not the same as LefSeās univariate approach. In addition, the results are expected to differ (for the same comparison) as the modeling paradigms are drastically different (e.g. nonparametric univariate in LefSe vs parametric multivariable in Maaslin2). Having said that, you do expect to see some consistent results (e.g. those with large enough effect sizes) but for a fair comparison, you need to make sure the p-values are comparable across models and they correspond to the same contrast (e.g. control v/s disease).
In your specific case, I would drop subject from the fixed effects and compare the p-values correspond to control/disease across models. Additionally, I might include subject as a random effect if there are repeated measures, which I cannot tell from your description.
To answer your question about LefSe v/s MaAsLin 2, please use your judgment based on the findings (e.g. biological relevance of the detected features) not on the number of significant features, which may not always correspond to the superiority of a tool over another.
Sir, thanks a lot for your reply. I think Iām lacking proper insight regarding the two tools and Iāve to work on that. Will you please suggest any articles or something so that it becomes easy to understand the nitty-gritty of the tools for a student from non-statistical background?
Anyway, I have prepared my metadata in this way:
ID
Control
Obese
SRS12345
YES
NO
SRS23456
YES
NO
SRS34567
YES
NO
SRS45678
YES
NO
SRS56789
NO
YES
SRS67890
NO
YES
SRS98765
NO
YES
SRS87654
NO
YES
SRS76543
NO
YES
And, when i used random_effects="ID" --fixed_effects="control,obese", I got significantly associated taxa (in the "significant_results.tsv" output file) consistent with LefSe output., I found consistent result with Lefse output. LefSe shown 37 taxa, MaAsLin2 shown 27 (all of them are also present in LefSe output also). Do you think this approach is correct?
Hi @DEEPCHANDA7 - you should create one single variable that includes two classes (similar to what you had before) and supply that to the fixed_effects command (itās redundant otherwise). For introductory statistics, Modern Statistics for Modern Biology is a good start.
Thanks a lot, sir for suggesting the book. I have noticed one thing when analyzed HUMAnN output data with MaAsLin2 and Lefse. After Lefse, I got around 30 significant features (P<0.05, without FDR correction). But, in MaAsLin2 output Iām getting no significant features because, the lowest q-value among all the features is 0.4 (default q-value <0.25). But, if I filter the features with P-value <0.05 from the MaAsLin2 output, I see, most of them are the same as that of Lefse output.
In this context, I am totally confused about whether I should consider those outputs from Lefse and report or not. One post in biobakery forum states adjustment is not necessary for Lefse (although @sma emphasised on āpersonal preferencesā ). I also suspect if P-value adjustment is discarding the true significant features.
Please, suggest me what should I do.
Could you please provide the answer for this question? Could you please give more knowledge about the difference between Lefse output and MaAslin2 output?
Thank you