MaasLin2 glm.fit error

Hi,

I’m using MaasLin2 with abundance data, it works well for almost all my sample and I’ve results at the end. But for some column, I have this message in the console and the features seem to be completely ignored in the output: Error in glm.fit(fr$X, fr$Y, weights = fr$wts, offset = fr$off, family = fm, : NA/NaN/Inf in 'x'. I checked and have no nan value in my input dataframe.

My Maaslin options are (i replaced some names for confidentiality):
fit_data2 = Maaslin2(
input_data = df_input_data,
input_metadata = df_input_metadata,
normalization = “NONE”,
transform = “NONE”,
analysis_method = “CPLM”,
output = “outdir”,
fixed_effects = c(“effect1”, “effect2”, “effect3”),
reference = c(“value1,value2”))

Is there something I can do ?

That error might show up if you have negative values. Did you try checking for that?

Hi, no negative value in the serie:

 Numeric 
           min max mean median   var   sd valid.n
........     0  44 2.97   0.07 43.64 6.61     109

Does it run when you don’t set analysis_method = "CPLM"?

If I use LM instead (with TSS and LOG), it does look like it works but this leads to very different results. I wanted to use CPLM for its zero inflated property.

Edit: Using CSS with CPLM leads to even more errors.

That’s alright, just trying to pare down where the error is occurring.

It’s something to do with the link function the cplm package uses. It will be difficult to solve this without data that can generate the issue. Are you able to share anonymized or synthetic data that we can use to debug the problem?

In the mean time, does it work with analysis_method set to “NEGBIN” or “ZINB”?

Just chiming in to offer my two cents.

The error Error in glm.fit(fr$X, fr$Y, weights = fr$wts, offset = fr$off, family = fm, : NA/NaN/Inf in 'x' occurs when the GLM algorithm fails to converge and it can often happen with the right and correct input. As you mentioned, it only happens for a few features but not all which is expected depending on how aggressive the filtering was. In my mind, this is an expected behavior of count-based or Tweedie GLMs and I can only think of more aggressive filtering as a solution to circumvent this error. I would recommend something like variance filtering to ensure that the features have sufficient variability to fit a complex model.

Thanks for your answers. Andrew: Please find attached the anonymized data. The error appears for feature_106 for example. I did not run binomial models because it’s summed abundance and not count. The exact code to reproduce the error with the anonymized files:

abundance = "data.annonymized.tsv"
df_input_data = read.table(file = abundance, header = TRUE, sep = "\t", stringsAsFactors = FALSE, row.names = 1)

metadata = "samples.metadata.anonymized.tsv"
df_input_metadata = read.table(file = metadata, header = TRUE, sep = "\t", stringsAsFactors = FALSE, row.names = 1)

fit_data2 = Maaslin2(
  input_data = df_input_data, 
  input_metadata = df_input_metadata, 
  normalization = "NONE",
  transform = "NONE",
  analysis_method = "CPLM",
  output = "anonymized_outdir", 
  fixed_effects = c("feature1", "feature2", "feature3"),
  reference = c("feature3,value_1"))

Himel: I see. Does it mean there is nothing to do about it? That’s a shame. Anyway, I think this should trigger an error somewhere in MaasLin2 with maybe a specific option to ignore the error.

data.annonymized.tsv (163.5 KB)
samples.metadata.anonymized.tsv (3.1 KB)