Result issue for LEFSE

amiralipour2019 · December 20, 2019, 4:58am

Hello,

I am using LEfSe using the following commands on Google Colab to generate the result:

!format_input.py mydata.txt mydata.in -c 1 -s -1 -u 2 -o 1000000

Regarding the data, it includes 2318 feature with two classes, and some subjects. I set up the p-values as following:

!run_lefse.py mydata.in mydata.res -a 1 -w 1 -l 0 -e 1

but the number of significantly discriminative features: 1699, and it does not let me to generate LDA score for all features. Here, should I pick another setting to increase significant features?

The main concern is that when I back to generate graph for different p-values in particular 0.00001, the plot and LDA scores are not consistent that I think it is logical, because of the following commands:

!run_lefse.py mydata.in mydata.res -a 1 -w 0.00001 -l 0 -e 1

also I used the following command:

!run_lefse.py mydata.in mydata.res -a 0.00001 -w 0.00001 -l 0 -e 1

It decreased the number of inconsistent features, but there is still same issue.

I would appreciate if you let me how set parameters to obtain reliable results?

Thanks for time,

Amir.

sma · January 2, 2020, 8:47pm

Hi Amir,
Thanks for reaching out, and sorry for not answering sooner!
To clarify, the significance level options in run_lefse (-a, -w) specify the p-value threshold for calling significant features. The higher this threshold, the more features would be considered “significant”. So theoretically, setting these options to one (as in your first example) should return all features as significant, as all p-values should be smaller than or equal to 1.
This being said, a) by choosing a high p-value threshold such as 1, you’d be asking LEfSe to call a lot of features as significantly differentially abundant, even if they pose no supporting statistical evidence. So I’d recommend choosing a more conservative threshold (such as the most common 0.05). b) Even when with a threshold of 1, it is not guaranteed that all features would be returned. One explanation I could come up with for your loss of features (from 2318 to 1699) is that certain features yielded p-values of exactly one (due to numerical accuracy loss), and were filtered out in consequence. You can maybe test for this by setting a threshold higher than one. But again, the results probably wouldn’t be very interpretable.
I’m not sure what you meant by “inconsistent” LDA results in the second half of the question. If you could clarify a bit more (example output would be very helpful here), I’d be glad to help take a look!
Thanks,
Siyuan

Topic		Replies	Views
Number of features in the original database leads to different number of significantly discriminative features in LEfSe? LEfSe	1	1162	November 29, 2020
Interpreting LEfSe Galaxy output LEfSe	2	671	July 19, 2021
LDA score cutoff changes slightly the LDA scores LEfSe	1	444	October 5, 2023
Values in Columns after run_lefse.py LEfSe	0	427	December 3, 2020
Issue while calculating LDA effect size LEfSe	1	831	June 10, 2021

Result issue for LEFSE

Related topics