Bacteria is returned as significant when it's the only Kingdom present

Hello!

I’m using lefse to analyze a dataset comparing two groups and the Kingdom, Bacteria, was returned as significant. I think this in error because the only Kingdom in the dataset is “Bacteria.” The feature plot (below) for that hit shows 8/10 samples have an abundance of 1, but when I add up the values for the columns in the table manually I get 1 for all 10 samples (as expected for a relative abundance plot). I do have one Taxon that couldn’t be assigned past Bacteria (the rest are Bacteria|Phylum|…|family/genus), but the abundance of of the one assignment is not 1 in any of my samples (though it is 0 in two control samples), so I’m not sure if that assignment is the issue.

Can you help me understand why this hit is coming back as significant and why it’s not counting the abundance appropriately? And if it is the lone Bacteria assignment could you explain why it’s not being included in the overall Kingdom calculation of LEfSe?

Thanks,
Samantha

Here is my relative abundance table: rel_freq_col_LF100_SI_min2_L6_lefse.txt (12.4 KB)

These are the commands I ran to get to this figure (I’m using the conda install of LEfSe):

format_input.py rel_freq_col_LF100_SI_min2_L6_lefse.txt formatted_LF100-SI_min2_L6.in -c 1 -o 1000000

run_lefse.py formatted_LF100-SI_min2_L6.in lefse_out_LF100-SI_min2_L6.res -w 1

plot_res.py --dpi 300 --format png lefse_out_LF100-SI_min2_L6.res lda_out_LF100-SI_min2_L6.png --width 10

plot_features.py --dpi 300 --format png -f diff formatted_LF100-SI_min2_L6.in lefse_out_LF100-SI_min2_L6.res sig_features_SI-LF100/

Hi -

I’d guess your data is from 16S sequencing? The issue here is that the feature named “Bacteria” confuses LEfSe - it won’t know it’s for an OTU classified under that taxon, vs. it’s the total abundance of Bacteria. Relabeling the feature to “Bacteria|OTU1” solved the issue on my end.

Let me know if this makes sense!
Siyuan

Yes, the data is 16S. That does make sense, thank you for the help!

~Samantha

Hello, I have a follow up question on this. Will something farther in the taxonomic chain cause the same problem? For example:

Bacteria|Tenericutes|Mollicutes|Mollicutes_RF39|uncultured_bacterium|uncultured_bacterium
Bacteria|Tenericutes|Mollicutes|Mollicutes_RF39|unidentified|unidentified
Bacteria|Tenericutes|Mollicutes|Mollicutes_RF39

Will LEfSe have a problem as counting all three of these as Mollicutes_RF39 when it runs on Order? Should I be adding something to everything that terminates? I collapsed the table to the genus level before running in LEfSe, would I be needing to put |OTU# after each genus as well?

Thanks,
Samantha

Hi -

I believe so. However, if I understood you correctly, the table collapsed at genus level should only have two types of features a) genera and b) OTUs not classifiable to genus level and hence assigned to higher taxa. In this case, you can
a) assign OTU #s to those unclassified at genus, so that LEfSe doesn’t get confused with, say, family level abundances, or
b) simply replace taxonomy separators with something LEfSe won’t recognize. I believe “;” instead of “|” does the trick. In this way LEfSe will only test for these features, and not, say, family level features calculated from genus level abundances.
Hope this is helpful,
Siyuan

I think I understand a, but I’m not sure about b.
From my example able above:

Bacteria|Bacteroidetes|Bacteroidia|Bacteroidales|Muribaculaceae;__

a) would be Bacteria|Bacteroidetes|Bacteroidia|Bacteroidales|Muribaculaceae|otu1
b) would be to leave it with the “;”? Bacteria|Bacteroidetes|Bacteroidia|Bacteroidales|Muribaculaceae;__

I’ve had problems with this where LEfSe thinks Muribaculaceae;__ is a different family from the other two Muribaculaceae and won’t include it in the family calculation for Muribaculaceae.

Could you please verify I am correct with a and expand on b?
Thanks,
Samantha

Hi -

a) Yep, you got it exactly right.

b) I was suggesting something more like
Bacteria|Bacteroidetes|Bacteroidia|Bacteroidales|Muribaculaceae ->
Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Muribaculaceae

Because LEfSe doesn’t recognize “;” as taxonomy delimiter, it will think “Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Muribaculaceae” is its own feature, separate from something like, say, “Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Muribaculaceae;some_genus”.

So each feature will be tested on its own. LEfSe won’t know to aggregate abundances for higher taxonomy ranks though. For example, you would only get results for “Bacteria” if you have an OTU unclassified under Bacteria. You won’t get testing results for the kingdom Bacteria.

Thanks,
Siyuan