The way microorganisms are named affect the final results of lefse

Hey, I encountered an issue: I was working on Galaxy and trying to run lefse but the output files were different when I changed the way microorganisms are named.
The lef_usual.txt is the input file whose microorganisms are named like “Clostridium_sp_DL_VIII”. Underscore is the only special character in the input file.
lef_usual.txt (423.5 KB)
The result is uploaded here.


Then I cleaned up the data to the most basic of formats like “ClostridiumspDLVIII” and ran it again. The input file named lef_basic.txt is uploaded here.
lef_basic.txt (423.4 KB)
The new result is uploaded here and it is different from the first one.

The relevant parameters of the two tasks are the same, and only the naming method of the first column of the input file has changed.

So, why does just changing the way microorganisms are named affect the final results? Do we need to turn the data into the most basic way like “ClostridiumspDLVIII” to get the most correct results? If so, cleaning up to the most basic of formats like “ClostridiumspDLVIII” is hard to read and it is also difficult to search “ClostridiumspDLVIII” on the Internet. So, can Galaxy solve this problem thoroughly? Can the program of Lefse analysis be optimized?

Hello,
Thank you for bringing up this issue, I apologize for the delay but I am looking into it now.
Best,
Meg

This issue has been resolved for the pypi version of LEfSe, and needs to be resolved still on Galaxy. In the meantime, I recommend using the pypi version downloadable here, with the following commands:

lefse_format_input.py Downloads/lef_usual.txt Downloads/lef_usual.in -c 2 -s -1 -u 1 -o 1000000
lefse_run.py Downloads/lef_usual.in Downloads/lef_usual.res
lefse_plot_res.py Downloads/lef_usual.res Downloads/lef_usual.png

When I run the above, and then re-run using lef_basic.txt, I get the same results. The results are similar to (though not identical–this is a version difference between pypi and Galaxy versions) the results you saw for the lef_basic run on the Galaxy version. I hope that helps, and we are working to resolve the problem on the Galaxy version.

Dear Meg

Thanks. During the execution of the 3rd line “lefse_plot_res.py lefse/collapse.frequency.table.with.meta.res lefse/collapse.frequency.table.with.meta.png” , I found the following error:

Traceback (most recent call last):
File “/opt/anaconda3/envs/thelefse/bin/lefse_plot_res.py”, line 10, in
sys.exit(plot_res())
File “/opt/anaconda3/envs/thelefse/lib/python3.9/site-packages/lefse/lefse_plot_res.py”, line 177, in plot_res
else: plot_histo_hor(params[‘output_file’],params,data,len(data[‘cls’]) == 2,params[‘report_features’])
File “/opt/anaconda3/envs/thelefse/lib/python3.9/site-packages/lefse/lefse_plot_res.py”, line 104, in plot_histo_hor
if len(rr) > params[‘max_feature_len’]: rr = rr[:params[‘max_feature_len’]/2-2]+" […]"+rr[-params[‘max_feature_len’]/2+2:]
TypeError: slice indices must be integers or None or have an index method

I cannot figure it out. Can you check please?

Best
Rashid

Hi Rashid,
I haven’t come across this particular error before. Is there a dataset you could send me that reproduces the error, and I can try to troubleshoot?
Thanks,
Meg