The bioBakery help forum

ZINB Model Not Working in Maaslin2

Hi,

Is there a bug in the implementation of ZINB modeling in the fit.data function (https://github.com/biobakery/Maaslin2/blob/2c5e3eb01636aa28f4ffd23aa34bd55dd068939e/R/fit.R) used in Maaslin2?

My input for Maaslin2 is direct from Humann2 after normalizing pathway abundances across samples by cpm. I ran Maaslin2 with normalization = NULL, transform = NULL, model = ZINB, a single categorical fixed effect and random effects = NULL, so it should be calling up pscl::zeroinfl. However, I get an error message “Fitting problem for feature X, returning NA” for every single pathway feature. With the same dataset and Maaslin2 parameters, I was able to get successful model fits for a regular NB model.

Is there something I should be checking for in my dataset that might explain why a negative binomial (NB) model works, but a zero-inflated negative binomial (ZINB) model does not? I plotted a histogram of the counts for a subsample of pathway features, and there are definitely some pathway features that are zero inflated in counts.

Also, as pathway features might have different distributions (e.g. Poisson vs. NB vs. ZINB etc.), might it be possible to add a feature within Maaslin2 that runs a model comparison for each feature with output AIC/BIC values, so that the most appropriate model can be used for each tested feature?

Thanks for your help!
Germaine

hi @germaine260 - it looks like there was a missing argument in the ZINB function call (dist = "negbin") which caused the error. I just pushed an update to fix it. Let me know if that resolves the issue on your end.

I reinstalled Maaslin2 again via BiocManager::install(“Maaslin2”), but I am still getting the same errors for every pathway when I set the model type to “ZINB”. NEGBIN works fine. I’m not sure if there’s still a bug with the ZINB function call, or do I need to something extra to ensure my version of Maaslin2 has the updated function call?

Hi @germaine260 - could you please install the GitHub development version and give it a try one more time?

I installed the GitHub development version (devtools::install_bitbucket(“biobakery/maaslin2@default”, ref=“tip”)) but am still getting this error for each pathway with a ZINB model: Error in model_function(formula, data = dat_sub, na.action = na.exclude) : invalid dependent variable, non-integer values. My input for Maaslin2 is direct from Humann2 after normalizing pathway abundances across samples by cpm, hence the values are non-integers and they work fine with a NB model.

Hey @germaine260 - under the hood, Maaslin2 calls the glm.nb function from the MASS package for negbin whereas it calls the zeroinfl function from the pscl package for ZINB. Apparently, the latter does not work for non-integer values. I just did a quick experiment to verify that (see below). Based on this, you might need to round up your CPM values before using the ZINB model.

# Load libraries 

library(pscl)
library(MASS)

# Load data  

data("bioChemists", package = "pscl")

# Artifically introduce non-count non-zero values 

bioChemists$art<-ifelse(bioChemists$art!=0, bioChemists$art+0.5, 0)

# Negative Binomial (works without error)

MASS::glm.nb(art ~ ., data = bioChemists) 

# Zero-inflated negative binomial with non-integer y (throws error)

zeroinfl(art ~ . | 1, data = bioChemists, dist = "negbin")

# Zero-inflated negative binomial with integer y (works without error)

zeroinfl(round(art) ~ . | 1, data = bioChemists, dist = "negbin")