I am using MaAslin2 with metaphlan output from biobakery_workflow, trying to answer which bacterial abundance is associated with human gene A copy number (CN-A).
my code is very simple:
maas_CN_contin = Maaslin2( input_data = input_data, input_metadata = input_metadata, output = "CN_contin_all", fixed_effects = c("CN-Afinal"),normalization = 'NONE',min_prevalence=0.05, correction = 'BH' )
this did not give any significant result where I tried prefiltering of input_data by runnning correlation test of “each taxa (row of the input data) ~ CN-Afinal”, and retrive those have correlation test p-val < 0.05. Using this subsetted data, I got significant results.
Can you explain why this happens? -
and I went through the source code and it seems it is using glm for fitting in my case, where i am not sure how glm works with Y matrix - I saw that glm need Y as a vectorized form.
Hi Jooyoung,
If I understand your question correctly, you’ve run two models: one in which you include all the taxa, and one in which you first run a correlation test and then only include the taxa that were significant from the correlation test in the MaAsLin run.
If I understand your second procedure correctly, I think you’ve ended up with an accidental form of data dredging. When you include all the taxa (model 1), MaAsLin automatically adjusts for the fact that you’ve tested many hypotheses by producing false discovery rate corrected q-values from the p-values. However, when you include only the previously significant taxa (model 2), MaAsLin doesn’t know that you’ve already effectively tested many hypotheses by analyzing the correlations first, so the q-values it produces correct over only the analyzed taxa rather than all the original taxa, making them artificially low. This would explain the fact that there are no significant results originally but there are significant results (q-values < 0.05) when using the subset data.
Otherwise, the code you’re using seems fine (maybe check that the hyphen in CN-Afinal
isn’t causing issues), so the glm/matrix piece probably shouldn’t make a difference.