I am using LEfSe analyses in the Galaxy webserver, and all analyses run correctly but I have a question about the subclass. In one of the datasets I’m using, I have no subclass, but the LEfSe analyses gives me the following information:
Number of significantly discriminative features: 340 ( 340 ) before internal wilcoxon Number of discriminative features with abs LDA score > 2.0 : 219
I though the Wilkoxon test would not be performed if I didn’t provide subclasses. Can you explain me what happened between the 340 features that were significant before the wilcoxon and the 219?
I am also having some trouble understanding if the parameter “Set the strategy for multi-class analysis” in step B refers to class or to subclass. As I read in your paper “Metagenomic biomarker discovery and explanation”, this multiclass strategy is to be applied on the classes but then the Wilkoxon text is mentioned, and I think this misunderstanding I’m having here is also connected to my next question.
Additionally, I have another matter that I would like to clarify:
I did LEfSe analysis using a dataset with 2 classes and when I obtain the LDA score plot (from step C in the webserver) which has one horizontal bar per each significantly discriminative feature and I understand that the bar colour corresponds to the class in which the feature was significantly more abundant. Following this analysis, I did an analysis using a dataset with 4 classes and I obtained the LDA score plots (from step C), which have one horizontal bar per significant feature - does this mean that the feature is significantly more abundant in the respective class than in all other 3 classes? What does this bar say about the other 3 classes?
Thanks for your question and apologies for the delayed response!! The reduction in the number of features presented here I don’t think is Wilcoxon test- you had the same number of features before and after. That is what the 340 (340) indicated, I believe. The reduction is selected for the most significant figures using an LDA score above 2. That reduces your number of discriminative features to 219. However, as has been noted Wilcoxon is still run by default even in the absence of subclasses (see: Lefse without any subclass- still valid?). You can specify this in the command line version as said in the that post.
I’m not sure I followed your middle question about the subclasses, where you interested in using a subclass in your analysis?
The LDA finds features that discriminate one class from the another class. I.e. if any class is significantly different from all other classes, it will be reported. The plotted score is the highest in all such comparisons. It does not say anything about comparisons among other classes. Does that make sense?
I hope this helps!
I too am using the Galaxy webserver without a subclass but I have found that the number of significantly discriminative features: 15 ( 20 ) before internal wilcoxon. Number of discriminative features with abs LDA score > 2.0 : 15.
Now since I did not have the same number of features before and after, does this mean that the Wilcoxon test was still preformed even though I am not using a subclass? Is there any way of getting around this on Galaxy?
I tried setting the wilcoxon alpha test to 0 as well as to 1 to see what it would do, but setting it to 0 found no discriminative features: significantly discriminative features: 0 ( 20 ) while setting it to 1 gave me the same results as it did when it was set to the default 0.05.
This post has a great answer to that question: Number of features before and after wilcoxon subclass test
Let us know if you need more information or if that doesn’t answer your question.