The bioBakery help forum

Problme with LEfSe using command line interface

Hi,

I am trying to analyze my dataset with LEfSe in ubuntu terminal.

There are no errors when I formatting and running LEfSe analysis from my feature table.

However, after the analysis (specifically, after the ‘run_lefse.py’ step), LDA difference was compared among ‘subjects’, not ‘classes’.

When I tested using tutorial data (hmp_small_aerobiosis.txt), same situation still occurred.

I installed LEfSe by conda with following command:
$ conda install -c biobakery lefse

and running LEfSe with :
$ lefse-format_input.py hmp_small_aerobiosis.txt hmp_small_aerobiosis.in -c 1 -s 2 -u 3 -o 1000000

$ run_lefse.py hmp_small_aerobiosis.in hmp_small_aerobiosis.res

$ lefse-plot_res.py hmp_small_aerobiosis.res hmp_small_aerobiosis.png

Resulting plot: It seems like LEfSe comparing among ‘subject’, not ‘class’.

I have no idea how to solve this problem.

Hello,
Thanks for your question. The problem is in the values you supplied after the options -c -s and -u when you ran format_input.py. To see the documentation on these options, run the following:
$ format_input.py -h

Whichever row in your data corresponds to the class variable should be supplied after -c, and the ID variable row should be supplied after -u. If there is no subclass, you do not need to supply the option “-s” at all.

I hope that helps!
-Meg

1 Like

Thank you for your kind reply!

I think I poorly explained my situation.

Exact problem is, though I set -c and -u options appropriately, LEfSe compared differences among “subject”, not “class”.

In the case of tutorial data which I analyzed for a test (the resulting plot was uploaded previously) (https://github.com/biobakery/biobakery/raw/master/demos/biobakery_demos/data/lefse/input/hmp_small_aerobiosis.txt), classes were in the first row and subjects were in the third row. According to the row position in the input file, I put the options -c 1 and -u 3 to set “row 1” as “class” and “row 3” as “subject”.
*Classes in the tutorial data are ‘High_O2’, ‘Mid_O2’, and ‘Low_O2’. Subjects are 158398106, 158742018, 158984779 etc…

As I understand, $lefse-plot_res.py supposed to plot differences among classes (High_O2, Mid_O2, Low_O2).
However, as you can see in the above resulting plot I previously uploaded, it plotted differences among subjects (158398106, 158742018, 158984779 etc…).

Is there anything I am missing? or would it be a problem with my computer settings?

Ah I see, thanks for clarifying. Let me take a look at the example data and get back to you.
Best,
Meg

Hello,
I ran through the tutorial and it works as intended for me (the grouping variable is correct), and I wanted to check that you’re using the latest version of lefse and the tutorial. Here is the link to the tutorial I was following:

I ask because the commands you pasted above are slightly different from the ones used in the tutorial ("lefse-format_input.py vs format_input.py, for instance).
Could you try running “conda update lefse” and then re-following the tutorial linked above, and letting me know if it works?
Thanks,
Meg