The bioBakery help forum

OTUs as a unique code or enter them as a taxonomic hierarchy

I have been using your excellent online LEfSe tool to perform LDA analyses in order to determine the OTUs that significantly discriminate biofilm compositions on different material surfaces. However, I have noticed that I get different results based on whether I feed in the OTUs as a unique code or enter them as a taxonomic hierarchy. e.g. ‘ASV_0000000238’ vs. ‘Bacteria| Proteobacteria| Alphaproteobacteria| Caulobacterales| Hyphomonadaceae’
I tend to get far more significantly discriminant results when I put them in as a unique OTU code and sometimes they don’t match up consistently with the results when the hierarchy is used.

I was wondering whether there is something about the tool that intrinsically accounts and adjusts for the taxonomic hierarchy in some way - and if you would be able to advise which method is more accurate? This sort of analysis is new to me - so I apologise if I am missing something obvious.

Hi user,

LEfSe is designed to accept hierarchal data. It is accurate that anything of the form A|B|C will be interpreted as a hierarchy, with every level implicitly tested (e.g. A and A|B along with A|B|C) whereas simple IDs will be tested independently.

The reason for the different results is likely because OTUs group into the genus level (or species-level) data and those OTUs may have different abundance trends between classes.


I would like to know how you have converted a single code to taxonomic hierarchy for example C to A/B/C

Usually, this would be done using a taxonomy mapping to the ASV names–it depends on the software you used to process the sequencing data, but generally the taxonomy assignments are output in an A|B|C format (or similar, such as A;B;C). Can you give me a few more details about how you’re processing the data, and I might be able to help?