Some features missed by maaslin in untargeted metabolomics and other LM/CPLM issues

Hi there- thank you for creating this awesome tool.

I have untargeted metabolomics data that I am trying to work through.
As is the nature of the dataset, it has a lot of zeros and roughly follows Tweedie distribution (and is continuous/ not count). My goal at this step is to filter out contaminants and internal standards by comparing my actual samples to method blanks (negative control). So I am running the following code:

Maaslin2(input_data = df.eval.pos.SDA,
input_metadata = metabo.eval.metadata.pos.2,
fixed_effects = “type”,
random_effects = “block”,
analysis_method = “CPLM”,
output = “metabo.pos.SDA_output”,
max_significance = 0.05)

where “type” is a categorical variable with two levels: “sample” and “blanks”.

I have used both LM and CPLM. Both lead to some trouble-

LM works well for the most part, but also misses out on some obvious features that were identified by other methods:


As you can see “sample” has a lot more of this feature present while “blanks” have none. Maaslin passes this by while others (correctly) ID’s this feature as a real feature. I have been comparing with the results I get using a two part model: SDA developed by Li et al 2018.

Running CPLM, on the other hand results in all features having the equal pvalue of 1.

If you have any advice on how to approach this, I would appreciate it.