I am using MaAsLin2 to perform a differential abundance analysis. I compare the number of reads (integer) between the two states. I haven’t applied any transformation, but I have performed a prevalence filtering beforehand. I have noticed that I have zero-inflation, and I use ZINB.
However, I get many fitting errors, such as:
INFO::Fitting model to feature number 838, #asv_name#
Error in eigen(h) : infinite or missing values in ‘x’
In addition: There were 33 warnings (use warnings() to see them)
WARNING::Fitting problem for feature 838 returning NA).
I lose up to ~50% of my ASVs for this reason. When I tried NEGBIN, I didn’t have this problem at all. I was just getting too few results. But due to the existence of excess zeros, I thought that ZINB would actually be a better option.
Also, I get some tiny q-values (they appear as 0), and if I check the ASV table I notice that either they usually exist in only a few number of samples or the numbers of reads are small (e.g. 10 in the first state and 5 in the second).
What do you think?
Thank you in advance,
Depending on what your data looks like and how many samples you have it’s not uncommon for models to have fitting issues.
Without looking at the data directly it’s hard to say what else is exactly going on but I would make sure that your metadata doesn’t have any NA values in them.
Thanks for your reply. No, the metadata does not contain any NA. I get that fitting issues can occur when e.g. the feature has 0 reads in almost all of the samples. But the issues occur also in features with many non-zeros, no outliers etc.
In the meantime I noticed something else as well: If I don’t add any fixed effects, the number of tested ASVs is x. (The rest are lost as I described). If I add a fixed effect, e.g. Age, the tested ASVs are y. If I add a different fixed effect, the tested ASVs are z. Etc. Does this make sense? I find it weird because the nature of the data does not change, only the chosen fixed effects do, so why is the number of tested ASVs (i.e. fitting issues) different?
It does make sense that the ability to model the data will differ with the inclusion and exclusion of different fixed effects. A common case where this might happen is when the fixed effect you are trying to model for some taxa is all 0 in one group causing the model to have issues converging to a solution.
In our next iteration of Maaslin we hope to address this issue with more clear reasons why a model did not solve properly.