Input file size limit to Galaxy server for LEfSe analysis

I am working on a dataset with 202 samples and 10,032 predicted KO-IDs that I got as an output from q2-PICRUSt2 plugin. I am trying to feed the .txt version of this file to LEfSe module on Galaxy to understand the enrichment of predicted KO-IDs in 3 different groups of my project. I keep getting the following error when I upload the data on the Galaxy server:

I have tried “one against all” option instead of “all against all” in the LDA Effect Size step. I have also tried less stringent p-values for both the Kruskal-Wallis and Wilcoxon tests together and separately (NOTE: there are no subclasses in my dataset - just 3 different groups which I want to treat as classes, however I still reduced the stringency on Wilcoxon, with the hope that I might get some result).

In a follow-up attempt to decipher what might be happening, I used a smaller dataset (202 samples, but only 36 predicted KO-IDs). I had extracted these specific KO-IDs of interest for another analysis pertaining to the same project. When I uploaded this “significantly” smaller dataset to run LEfSe on Galaxy, I got the results smoothly.

This leads to be speculate if there is a limit to the dataset size that we can upload to Galaxy server in order to run LEfSe. Is this the case? If so, what can be done to smoothly upload large datasets for enrichment analysis using LEfSe? Can someone help me with this issue? Any leads/suggestions/ideas would be extremely helpful.

Thank you all so much,

Hello, I am still looking forward to some advice/guidance with this question. Any leads would be greatly appreciated.


Hi Aakarsha,
So sorry for the delay in response. It looks like your LEfSe ran correctly on the larger dataset, as shown by the output of the number of significant features, but it’s giving you a warning about one of your variables (KO-IDs). If you remove KO-ID #2105 (I assume this is either the name or the index of the variable) and re-run, do you have the same warning message?

Hi Meg,
Thank you for your response. My thought went along the same lines as you described in your message.
I believe the number shown in the screenshot (2105) is the index number, since KO-IDs start with a ‘K’. With this in mind, I went ahead and deleted all the indices that come up and re-ran the LEfSe analysis and landed with a similar error message - this time, the indices were a little different, since I had deleted the KO-IDs that correspond to the indices that came up in the initial error message.
On that note, the number of indices that come up in the error message reduce/change as I play with the p/alpha values for Wilcoxon and Kruskal-Wallis tests in the LEfSe analysis. Any thoughts on this?