Discordance of results after LOG and AST transformation

Hi, I’m using Maaslin2 to analyze Humann2 output (genefamilies and pathabundance tables) to assess differences between treatment (Bb) and control (C) samples (5 biological replicates).
I used humann_renorm to obtain CPM values and the sample-specific tables were finally joined.

Then in R:

head(genefamilies)
head_genefamilies.csv (821 Bytes)

head(pathabundance)
head_pathabundance.csv (729 Bytes)

metadata
metadata.csv (177 Bytes)

I’ve first tried LOG transform on both files with the following command:

fit_data2 = Maaslin2(normalization = “NONE”,
input_data = df_input_data,
input_metadata = df_input_metadata,
output = “output2”,
fixed_effects = c(“condition”),
transform = “LOG”)

And then I’ve tried AST transformation:

fit_data2 = Maaslin2(normalization = “NONE”,
input_data = df_input_data,
input_metadata = df_input_metadata,
output = “output2”,
fixed_effects = c(“condition”),
transform = “AST”)

LOG results are ok, but identified few significantly abundant features:
number of significant genefamilies = 6
number of significant pathways = 6

AST results identified 160 genefamilies, while for pathways resulted in the following error (with no output):

Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
2021-10-29 10:26:27 WARNING::Fitting problem for feature 123 returning NA

Moreover, when I compare the results of gene familes (LOG vs AST), the upregulated gene families identified using LOG transformation are NOT present in AST gene families result table.

I would like to ask:

  1. Am I using the correct methods, considering that in my data zero-count is frequent for many features in many samples?
  2. Considering the discordance in the gene families output, in which method (LOG or AST) I should trust?
    3)What’s the reason of such error when I use AST on pathways? It depends by the many zeros in my dataset?

Thank you in advance

Andrea
3)

Hi @anbec,

  1. The correct method for analysis with zero inflated data is still up for debate. It is something we address in the associated manuscript a lot, but basically the best method depends on your data. However, given the results in the manuscript the methods applied in MaAsLin are generally appropriate for zero inflated data.
  2. We refrain from recommending model choice in general due to data-, design-, and platform-specific differences. I suggest asking questions like which results make more sense? Which are less outlier driven? etc.
  3. I believe the error stems from the asin(sqrt(abs(x))) call which errors out when not used on data between 0 (really -1) and 1 (e.g. proportional data). When I run your pathways with the normalization parameter on, I do not get the error. When I run it with the code you used above - I do get the error. I am not sure why this would affect the pathways and not the genefamilies as it appears that both of your tables are in CPM.
  4. In addition, looking at the data supplied we normally suggest that users run either the stratified by species results (Only rows with |s__) or the unstratified results from HUMAnN separately (e.g. use the lines without |s__).

I hope this helps!
Best,
Kelsey

1 Like

Thank you so much!

Andrea