I’m using lefse to analyze a dataset comparing two groups and the Kingdom, Bacteria, was returned as significant. I think this in error because the only Kingdom in the dataset is “Bacteria.” The feature plot (below) for that hit shows 8/10 samples have an abundance of 1, but when I add up the values for the columns in the table manually I get 1 for all 10 samples (as expected for a relative abundance plot). I do have one Taxon that couldn’t be assigned past Bacteria (the rest are Bacteria|Phylum|…|family/genus), but the abundance of of the one assignment is not 1 in any of my samples (though it is 0 in two control samples), so I’m not sure if that assignment is the issue.
Can you help me understand why this hit is coming back as significant and why it’s not counting the abundance appropriately? And if it is the lone Bacteria assignment could you explain why it’s not being included in the overall Kingdom calculation of LEfSe?
I’d guess your data is from 16S sequencing? The issue here is that the feature named “Bacteria” confuses LEfSe - it won’t know it’s for an OTU classified under that taxon, vs. it’s the total abundance of Bacteria. Relabeling the feature to “Bacteria|OTU1” solved the issue on my end.
Will LEfSe have a problem as counting all three of these as Mollicutes_RF39 when it runs on Order? Should I be adding something to everything that terminates? I collapsed the table to the genus level before running in LEfSe, would I be needing to put |OTU# after each genus as well?
I believe so. However, if I understood you correctly, the table collapsed at genus level should only have two types of features a) genera and b) OTUs not classifiable to genus level and hence assigned to higher taxa. In this case, you can
a) assign OTU #s to those unclassified at genus, so that LEfSe doesn’t get confused with, say, family level abundances, or
b) simply replace taxonomy separators with something LEfSe won’t recognize. I believe “;” instead of “|” does the trick. In this way LEfSe will only test for these features, and not, say, family level features calculated from genus level abundances.
Hope this is helpful,
Siyuan
a) would be Bacteria|Bacteroidetes|Bacteroidia|Bacteroidales|Muribaculaceae|otu1
b) would be to leave it with the “;”? Bacteria|Bacteroidetes|Bacteroidia|Bacteroidales|Muribaculaceae;__
I’ve had problems with this where LEfSe thinks Muribaculaceae;__ is a different family from the other two Muribaculaceae and won’t include it in the family calculation for Muribaculaceae.
Could you please verify I am correct with a and expand on b?
Thanks,
Samantha
b) I was suggesting something more like
Bacteria|Bacteroidetes|Bacteroidia|Bacteroidales|Muribaculaceae ->
Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Muribaculaceae
Because LEfSe doesn’t recognize “;” as taxonomy delimiter, it will think “Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Muribaculaceae” is its own feature, separate from something like, say, “Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Muribaculaceae;some_genus”.
So each feature will be tested on its own. LEfSe won’t know to aggregate abundances for higher taxonomy ranks though. For example, you would only get results for “Bacteria” if you have an OTU unclassified under Bacteria. You won’t get testing results for the kingdom Bacteria.