Is MaAsLin2 reading features all at once?

Hello, I know that MaAsLin2 does not consider all features (dependent variables such as taxa) together in a single analysis. However, I am encountering an unusual problem:

  1. I have one metadata file and two data files (data file 1 and data file 2).

  2. The two data files have a number of dependent variables (taxa), some of which are unique to the data file 1 and data file 2, and a couple are the same dependent variable that is just present in both data file 1 and data file 2 (e.g., OTU_Shannon).

  3. When I use the same metadata against two different data files, I get different responses for associations between my metadata and the identical shared dependent variable (e.g., OTU_Shannon).

My codes are below - does someone know what the issue may be here?

fit_data = Maaslin2(
input_data = data_file_1,
input_metadata = metadata,
output = “Energy”,
analysis_method = “LM”,
normalization = “NONE”,
standardize = TRUE,
min_prevalence = 0.1,
min_abundance = 0.001,
plot_heatmap = TRUE,
fixed_effects = c(“energy”))

fit_data = Maaslin2(
input_data = data_file_2,
input_metadata = metadata,
output = “Energy”,
analysis_method = “LM”,
normalization = “NONE”,
standardize = TRUE,
min_prevalence = 0.1,
min_abundance = 0.001,
plot_heatmap = TRUE,
fixed_effects = c(“energy”))

Again, the metadata is identical: “energy” is going up against data file 1 and data file 2, both of which have the same dependent variable (OTU_Shannon) but also unique dependent variables (specific taxa). In data file 1, energy is associated with OTU_Shannon, but when using data file 2 it is not.

Here’s another example looking at an amino acid from the metadata: again, the metadata used in association with data file 1 and data file 2 is identical, and OTU_Observed is also identical between data file 1 and data file 2. Below, you will see I get different FDR’s even though you can see I am working with the exact same data points and exact same code. I also have a data file 3 with the same OTU_Observed and that one doesn’t come up significant at all. Could the other taxa being different in each of these datasets be affecting the results?


@andrewGhazi @himel.mallick @nearinj I would appreciate your help.

Update 3: I created test datasets mimicking the original data I was working with:

Metadata: energy
Dataset 1: OTU_Shannon, bifido, lacto
Dataset 2: OTU_Shannon, proteo, bacteroides, clostridium

OTU_Shannon are identical to one another.

When running MaAslin2, energy is associated with OTU_Shannon in dataset 1 but not in dataset 2. Can someone please explain why that is? I was under the impression that MaAslin2 examines the relationship between a specific metadata (energy) and each individual feature separately.

Thank you.

hi there,

It looks like the results are as expected given the coefficients are the same. The key thing to remember is that the q-value output is an FDR corrected p-value (False discovery rate - Wikipedia). This means that the number of tests and the distribution of those p-values in each Maaslin run will impact the final q-values for each variable (even if they have the same coefficient and same p-values.

Hope that helps
Cheers,
Jacob Nearing

1 Like