I am trying to better understand the parameters for the lefse analysis. Specifically the LDA score threshold, I see 2.0 is used a lot. Is there a significance of such a value? Secondly, for normalization value I see 1,000,000 is used a lot. Is there a recommended value based on the our data? Also is there a rational as to why this normalization is need when relative abundance is an input?
Thanks for your questions. The default LDA score threshold of 2 is what the LEfSe paper used in testing/demonstrating LEfSe, which is likely why it is widely used. However, it can be adjusted by the user, for instance, if many features are differentially abundant with LDA score >2, it can be useful to use a more stringent threshold. I don’t think there is significance to the value of 2.0 per se, only that it was a sufficiently large difference in abundance to be potentially biologically meaningful.
Likewise, my understanding is that the option to normalize per-sample read counts to 1M is meant to improve the calculation of LDA scores for features with low read counts; the exact number is not meaningful.
I hope that helps!