Is there a bug in the implementation of ZINB modeling in the fit.data function (https://github.com/biobakery/Maaslin2/blob/2c5e3eb01636aa28f4ffd23aa34bd55dd068939e/R/fit.R) used in Maaslin2?
My input for Maaslin2 is direct from Humann2 after normalizing pathway abundances across samples by cpm. I ran Maaslin2 with normalization = NULL, transform = NULL, model = ZINB, a single categorical fixed effect and random effects = NULL, so it should be calling up pscl::zeroinfl. However, I get an error message “Fitting problem for feature X, returning NA” for every single pathway feature. With the same dataset and Maaslin2 parameters, I was able to get successful model fits for a regular NB model.
Is there something I should be checking for in my dataset that might explain why a negative binomial (NB) model works, but a zero-inflated negative binomial (ZINB) model does not? I plotted a histogram of the counts for a subsample of pathway features, and there are definitely some pathway features that are zero inflated in counts.
Also, as pathway features might have different distributions (e.g. Poisson vs. NB vs. ZINB etc.), might it be possible to add a feature within Maaslin2 that runs a model comparison for each feature with output AIC/BIC values, so that the most appropriate model can be used for each tested feature?
Thanks for your help!
hi @germaine260 - it looks like there was a missing argument in the ZINB function call (
dist = "negbin") which caused the error. I just pushed an update to fix it. Let me know if that resolves the issue on your end.
I reinstalled Maaslin2 again via BiocManager::install(“Maaslin2”), but I am still getting the same errors for every pathway when I set the model type to “ZINB”. NEGBIN works fine. I’m not sure if there’s still a bug with the ZINB function call, or do I need to something extra to ensure my version of Maaslin2 has the updated function call?
Hi @germaine260 - could you please install the GitHub development version and give it a try one more time?
I installed the GitHub development version (devtools::install_bitbucket(“biobakery/maaslin2@default”, ref=“tip”)) but am still getting this error for each pathway with a ZINB model: Error in model_function(formula, data = dat_sub, na.action = na.exclude) : invalid dependent variable, non-integer values. My input for Maaslin2 is direct from Humann2 after normalizing pathway abundances across samples by cpm, hence the values are non-integers and they work fine with a NB model.
Hey @germaine260 - under the hood, Maaslin2 calls the
glm.nb function from the
MASS package for
negbin whereas it calls the
zeroinfl function from the
pscl package for
ZINB. Apparently, the latter does not work for non-integer values. I just did a quick experiment to verify that (see below). Based on this, you might need to round up your CPM values before using the ZINB model.
# Load libraries
# Load data
data("bioChemists", package = "pscl")
# Artifically introduce non-count non-zero values
bioChemists$art<-ifelse(bioChemists$art!=0, bioChemists$art+0.5, 0)
# Negative Binomial (works without error)
MASS::glm.nb(art ~ ., data = bioChemists)
# Zero-inflated negative binomial with non-integer y (throws error)
zeroinfl(art ~ . | 1, data = bioChemists, dist = "negbin")
# Zero-inflated negative binomial with integer y (works without error)
zeroinfl(round(art) ~ . | 1, data = bioChemists, dist = "negbin")