Input file size limit to Galaxy server for LEfSe analysis

aakarsharao · August 4, 2022, 4:59pm

Hello,
I am working on a dataset with 202 samples and 10,032 predicted KO-IDs that I got as an output from q2-PICRUSt2 plugin. I am trying to feed the .txt version of this file to LEfSe module on Galaxy to understand the enrichment of predicted KO-IDs in 3 different groups of my project. I keep getting the following error when I upload the data on the Galaxy server:

I have tried “one against all” option instead of “all against all” in the LDA Effect Size step. I have also tried less stringent p-values for both the Kruskal-Wallis and Wilcoxon tests together and separately (NOTE: there are no subclasses in my dataset - just 3 different groups which I want to treat as classes, however I still reduced the stringency on Wilcoxon, with the hope that I might get some result).

In a follow-up attempt to decipher what might be happening, I used a smaller dataset (202 samples, but only 36 predicted KO-IDs). I had extracted these specific KO-IDs of interest for another analysis pertaining to the same project. When I uploaded this “significantly” smaller dataset to run LEfSe on Galaxy, I got the results smoothly.

This leads to be speculate if there is a limit to the dataset size that we can upload to Galaxy server in order to run LEfSe. Is this the case? If so, what can be done to smoothly upload large datasets for enrichment analysis using LEfSe? Can someone help me with this issue? Any leads/suggestions/ideas would be extremely helpful.

Thank you all so much,
Aakarsha

aakarsharao · August 18, 2022, 3:28pm

Hello, I am still looking forward to some advice/guidance with this question. Any leads would be greatly appreciated.

Thanks,
Aakarsha

mishort · August 19, 2022, 2:59pm

Hi Aakarsha,
So sorry for the delay in response. It looks like your LEfSe ran correctly on the larger dataset, as shown by the output of the number of significant features, but it’s giving you a warning about one of your variables (KO-IDs). If you remove KO-ID #2105 (I assume this is either the name or the index of the variable) and re-run, do you have the same warning message?
Thanks,
Meg

aakarsharao · August 19, 2022, 5:24pm

Hi Meg,
Thank you for your response. My thought went along the same lines as you described in your message.
I believe the number shown in the screenshot (2105) is the index number, since KO-IDs start with a ‘K’. With this in mind, I went ahead and deleted all the indices that come up and re-ran the LEfSe analysis and landed with a similar error message - this time, the indices were a little different, since I had deleted the KO-IDs that correspond to the indices that came up in the initial error message.
On that note, the number of indices that come up in the error message reduce/change as I play with the p/alpha values for Wilcoxon and Kruskal-Wallis tests in the LEfSe analysis. Any thoughts on this?
Best,
Aakarsha

Topic		Replies	Views
Request for Assistance: LEfSe Module File Upload Issue LEfSe	2	242	February 19, 2024
B) LDA Effect Size(LEfSe) error ! please help me! LEfSe	1	961	January 21, 2021
Problem with lefse analysis LEfSe	2	705	July 31, 2023
Error in when running lefse on Galaxy in step B (LDA effect size) LEfSe	0	63	September 11, 2024
No Tabular Dataset Available LEfSe	1	808	April 21, 2020

Input file size limit to Galaxy server for LEfSe analysis

Related topics