The bioBakery help forum

Recommended Test for Large Variability

I notice that the default settings don’t do well at picking up changes in proportions which change from a fairly constant value to a highly variable but systematically higher (or lower) value. For example, there is a particular species of bacterium with the following percentages:

Healthy volunteers tongue: 0, 0, 8.02, 0, 0, 0, 0, 0, 0, 0.98
Cancer tongue:18.7, 10.01, 0.46, 4.78, 51.55, 12.81, 4.39, 0, 27.83, 0.24, 55.39, 10.46, 0, 0, 0.19, 16.68, 0, 0.88

The p-value is 0.02 but q-value is 0.33. Clearly, the cancer samples often have quite high proportion but it is rather variable. ?Maaslin2 isn’t particularly useful because the documentation doesn’t specify which set of values are allowed for analysis_method and other such parameters. I found the information on the website, though. A better way to code it is like:

Maaslin2 <- function(analysis_method = c("LM", "CPLM", "NEGBIN", "ZINB"))
{
  analysis_method <- match.arg(analysis_method)
  # Ensure it's one of the allowable ones or automatically the first value if
  # user didn't specify a value to the function call.
}

and also document all of the valid options for each parameter in the help within R.

Anyway, all of the other methods seem suited to count data, which MetaPhlAn doesn’t produce. For my example data, should I just simply split my samples into Low (less than 1%) and High (more than 10%) groups and simply use Fisher’s Exact Test instead of MaAsLin?

Hi @Dario - you can use the CPLM model which works with both count and non-count data. The CPLM model is more appropriate when there is a large number of biological zeroes and can be a better choice than the default LM when you have large variability. We are currently working on the tutorial to make it more comprehensive so that the choice of the most appropriate statistical model becomes more clear for various data types. Thanks for the suggestion!