I notice that the default settings don’t do well at picking up changes in proportions which change from a fairly constant value to a highly variable but systematically higher (or lower) value. For example, there is a particular species of bacterium with the following percentages:
Healthy volunteers tongue: 0, 0, 8.02, 0, 0, 0, 0, 0, 0, 0.98
Cancer tongue:18.7, 10.01, 0.46, 4.78, 51.55, 12.81, 4.39, 0, 27.83, 0.24, 55.39, 10.46, 0, 0, 0.19, 16.68, 0, 0.88
The p-value is 0.02 but q-value is 0.33. Clearly, the cancer samples often have quite high proportion but it is rather variable. ?Maaslin2
isn’t particularly useful because the documentation doesn’t specify which set of values are allowed for analysis_method
and other such parameters. I found the information on the website, though. A better way to code it is like:
Maaslin2 <- function(analysis_method = c("LM", "CPLM", "NEGBIN", "ZINB"))
{
analysis_method <- match.arg(analysis_method)
# Ensure it's one of the allowable ones or automatically the first value if
# user didn't specify a value to the function call.
}
and also document all of the valid options for each parameter in the help within R.
Anyway, all of the other methods seem suited to count data, which MetaPhlAn doesn’t produce. For my example data, should I just simply split my samples into Low (less than 1%) and High (more than 10%) groups and simply use Fisher’s Exact Test instead of MaAsLin?