The bioBakery help forum

LEfSe analyses in the Galaxy

I am using LEfSe analyses in the Galaxy webserver, and all analyses run correctly but I have a question about the subclass. In one of the datasets I’m using, I have no subclass, but the LEfSe analyses gives me the following information:

Number of significantly discriminative features: 340 ( 340 ) before internal wilcoxon Number of discriminative features with abs LDA score > 2.0 : 219

I though the Wilkoxon test would not be performed if I didn’t provide subclasses. Can you explain me what happened between the 340 features that were significant before the wilcoxon and the 219?

I am also having some trouble understanding if the parameter “Set the strategy for multi-class analysis” in step B refers to class or to subclass. As I read in your paper “Metagenomic biomarker discovery and explanation”, this multiclass strategy is to be applied on the classes but then the Wilkoxon text is mentioned, and I think this misunderstanding I’m having here is also connected to my next question.

Additionally, I have another matter that I would like to clarify:

I did LEfSe analysis using a dataset with 2 classes and when I obtain the LDA score plot (from step C in the webserver) which has one horizontal bar per each significantly discriminative feature and I understand that the bar colour corresponds to the class in which the feature was significantly more abundant. Following this analysis, I did an analysis using a dataset with 4 classes and I obtained the LDA score plots (from step C), which have one horizontal bar per significant feature - does this mean that the feature is significantly more abundant in the respective class than in all other 3 classes? What does this bar say about the other 3 classes?

Hi there,

Thanks for your question and apologies for the delayed response!! The reduction in the number of features presented here I don’t think is Wilcoxon test- you had the same number of features before and after. That is what the 340 (340) indicated, I believe. The reduction is selected for the most significant figures using an LDA score above 2. That reduces your number of discriminative features to 219. However, as has been noted Wilcoxon is still run by default even in the absence of subclasses (see: Lefse without any subclass- still valid?). You can specify this in the command line version as said in the that post.

I’m not sure I followed your middle question about the subclasses, where you interested in using a subclass in your analysis?

The LDA finds features that discriminate one class from the another class. I.e. if any class is significantly different from all other classes, it will be reported. The plotted score is the highest in all such comparisons. It does not say anything about comparisons among other classes. Does that make sense?

I hope this helps!
Best,
Kelsey