Hello, I am using maaslin2 on microbiome data aggregated at different taxonomic levels. I read through the Maaslin2 documentation, as well as the published paper, and I am still confused as to whether hierarchically structured taxonomic data (such as what is output from metaphlan, kraken/braken, or qiime) can (or should) be used as input into Maaslin2. I ran Maaslin2 on a full taxonomic table of 16S ASV counts, and found significant differences at multiple taxonomic levels between covariates of interest, and it makes biological sense, but I want to be sure that this use case is one which Maaslin2 was designed for.
Please advise, thanks!
There’s debate in the literature on this. We’ve had a similar thread with some additional discussion on this point here.
Personally I think the main thing to account for is the dependence between taxonomic levels. A test for a species and a test for the genus that contains it will not be independent.
Apparently BH FDR correction is robust against this dependence for positive correlations, but nothing more general than that. Could taxonomic levels ever be negatively correlated? Off the top of my head I can’t imagine why that would happen, but I can’t rule it out either.
BY FDR correction (which you can use in Maaslin2 by setting
correction = "BY") is apparently robust against any dependence structure. Without having tried it, I assume it’s correspondingly more conservative. If you try and find interesting distinctions against other methods, let us know.