The bioBakery help forum

Redundant taxa in LEfSE output

Hi, I’m new to LEfSe. After trying to get my data to work with LEfSe in Galaxy for a week now, I’ve got some results but also had a couple of unclear issues that I wish to be clarified here.

I generated LEfSe input file from QIIME2 16S feature table, collapsed it to level 7 and calculated relative frequency. I input this file in the Galaxy:

feature-table.csv (103.1 KB)

I wanted to compare between 2 classes, response (R) or non-response (NR) to treatment. There was no subclass.The process went smoothly and I got the nice looking LEfSe plot as shown here:

However, I am unclear with these two issues.

  1. There some groups of taxon in the same lineage showing up. For example:






When I check the LEfSE result table, each member in the same group has exactly the same LDA score. Is it normal to get the result like this? I’m not sure but I guess it’s because each of them is a singleton and there’s no other member in these taxa so they all showed up with the same LDA score. However, to have all of them in the plot make the information a bit redundant.

  1. There is k__Bacteria taxon enriched in R group with high LDA score. That’s weird because all features in both classes suppose to be bacteria. Is it because there’s also an Archaea feature in the data? If so, why don’t LEfSE calculate the other around (e.g. Archaea enriched in NR group)?

Thank you

Hi there,
Thanks for your questions. To your first point, your interpretation that the cases you mention are singletons, though I wouldn’t necessarily call the information redundant–I think it is useful to know that there was only one species identified in, say, f__RF16, which was significantly enriched in the R group. While I agree that it crowds the bar plot a bit, I don’t know of an easy way to get rid of the genus- or species-level features from the plot without affecting the results. If you wanted to condense the data to the genus level and re-run, that may help, though you’d be losing species-level information. Also, if it helps, it may be easier to see the usefulness of the singleton information from the cladogram.
To your second question, from your table it seems there are two kingdoms represented (Bacteria and Archaea) as well as an “unclassified” kingdom group. Since there are more than two assignments for the kingdom level, it’s possible to have Bacteria be over-represented in R, but not have Archaea, say, over-represented in NR. If you were to have two categories instead, where one was over-represented in R and (by necessity) the other over-represented in NR, I believe both categories would both show up in the bar plot.