Problem with LEfSe

sagunmaharjann · February 18, 2020, 3:55pm

I hope you are doing well. I am reaching out with what I think is a serious problem with LEfSe. We have just realized that modifying names of features (e.g. genes or species) changes the results of the analysis. For example,in one dataset I am currently working on, replacing “sp._oral_taxon” with “HOT” to make the names shorter, results in 21 differentially abundant species instead of 22, and the LDA score also changes. When I tried to replace the species names with just numbers, I get 16 differentially abundant taxa instead. I have suspended submitting a manuscript until this is resolved.

I am facing the same problem with both the Galaxy and command line versions. Please find attached 3 versions of the input file I have used recently:

1- With full species names

2- With abbreviated species names

3- species names replaced with numbers

I obtain 22, 21, and 16 differentially abundant taxa with these files, respectively, at LDA score of >= 2.5.JIA_species_level_names_changed.txt (163.6 KB) JIA_species_level_names_replaced_by_numbers.txt (159.4 KB) JIA_species_level.txt (164.3 KB)

fbeghini · February 18, 2020, 4:11pm

Hi @sagunmaharjann,
I’ll have a look at this as soon as possible. This seems a very serious issue.

fbeghini · February 21, 2020, 9:17am

Hi @sagunmaharjann,
It seems that some of the species names have dots and pipes inside the name.
Lefse interprets the pipe character as a split for the creation of new taxonomic levels, in the results of run_lefse.py you can see that Neisseria flavescens|subflava has been split in two species which results in the addition of a new feature to the testing (line 276 of format_input.py for reference).
This has not been done when the species names have been replaced with numbers.

The same behavior of hierarchy building can happen also when dots are present in the name (Fusobacterium_sp._HOT_204), line 196 of format_input.py for reference.

DrNezar · February 21, 2020, 12:32pm

Dear Francesco
Thank you for taking the time to look into this, but I think the problem is independent of the dots and pipes. The difference between the two files with names is that “oral_taxon” was changed to “HOT” so dots and pipes remain the same … and from my experience LEfSe does not split by dots, it just replaces them with underscore. BTW, I ran all three files by MaAsLin and the results were identical.

Let me share with you a simpler example. Attached are two input files for genus-level data. One has names with no symbols (including underscores). In the other file, the names are replaced by numbers. The number of differentially abundant features identified are 4 and 7 for the two files, respectively.

I believe the software has a bug.

Best

P.S.: I am not able to attach the files (it says new users can not upload files), so I have sent them to Sagun

JIA_genus_level_names changed.txt (50.8 KB) JIA_genus_level_names_replaced_by-numbers.txt (50.2 KB)

DrNezar · February 25, 2020, 5:35pm

Hello again,
It seems I found out why I had different results with the two species-level input files with names (JIA_species_level.txt and JIA_species_level_names_changed.txt). One species is listed twice in the original file as follows:
Actinomyces_sp._Oral_Taxon_180
Actinomyces_sp._oral_Taxon_180

So the only difference is in the “O” (Capital vs. small), so the software recognizes them as two different species. However, with the renaming done in the other file (oral_taxon replaced by HOT), both names become identical and software then merges them together!

This however only explains the differences in results obtained with these two files, but of course does not explain why we get totally different results when species names are replaced by numbers. I hope you continue working on it.

Thank you

sma · April 10, 2020, 7:57pm

Hi -
If this is still an issue, could you share the command you used on the two provided input files? Running the following command does seem to generate identical results, albeit ordered differently.

format_input.py JIA_genus_level_names\ changed.txt JIA_name.in -u 2
run_lefse.py JIA_name.in JIA_name.out
format_input.py JIA_genus_level_names_replaced_by-numbers.txt JIA_number.in -u 2
run_lefse.py JIA_number.in JIA_number.out

Topic		Replies	Views
LEfSe feature names dependency LEfSe	1	614	April 17, 2020
Changing feature name changes results in LEfSe LEfSe	1	601	July 10, 2020
LEfSe input file format question Downstream analysis and statistics	1	330	June 28, 2022
Lefse data and results LEfSe	1	901	May 14, 2021
LEfSe cladogram problem LEfSe	3	1285	June 4, 2021

Problem with LEfSe

Related topics