Hello community, this is my first question regarding LEfSe.
I am currently analyzing amplicon sequencing data using LEfSe, and I want to compare relative abundances of taxa between disease state and non-disease state, which I defined as class (in this thread I call it disease1 and non-disease1). Under each class there are subdivisions of classes, which we called disease2 (and non-disease2).
So the input data looks like:
Class: disease1 disease1 disease1 non-disease1 non-disease1 non-disease1
Subclass: disease2 disease2 non-disease2 disease2 disease2 non-disease2
I read the thread below, and found that in such a case we should not use disease2 as subclass, as it is not subdivision of single class (disease2 category span both disease1 and non-disease1).
https://groups.google.com/forum/#!searchin/lefse-users/subclass|sort:date/lefse-users/wSnLGkaAa9I/He4SpNetBQAJ
My question is that when “disease2 under disease1” is thought to be different condition from “disease2 under non-disease1”, is this still recommended not to use disease2 as subclass?
I am a bit confused about this particular circumstance, and any help would be appreciated like paper that using subclasses which is not a subdivision of single class. Thanks in advance.
Sincerely,
I’m not sure I completely agree with the discussion in the google groups post you linked. I believe it can be appropriate to use subclass in your situation depending on how many samples you have in each subclass. In the original paper they state several things that I think can be applied to your example.
If you used the input data you indicated, there would be two comparisons you could make.
- Are there features that are more abundant in disease1-disease2 samples compared to non-disease1-disease2 samples AND ALSO more abundant in disease1-nondisease2 samples compared to non-disease1-nondisease2 samples?
This essentially controls for subclass by doing pairwise comparisons only between the same subclasses and reporting features that were significant under both comparisons…
- Are there features that are more abundant in disease1-disease2 samples compared to both non-disease1-disease2 AND non-disease1-nondisease2 samples AND ALSO more abundant in disease1-nondisease2 samples than in both non-disease1-disease2 AND non-disease1-nondisease2 samples?
I can’t think of a situation where this comparison would be specifically informative, but it would provide more strength to the differences between the two classes.
I say this with the disclaimer that I am not an expert on this tool and my attempt at an explanation comes solely from my experience and interpretation of how the tool can be used, but I would love to hear back from other community members.
Thank you very much for your answer, jmgreenb!
I agree with your thought on the comparison number one.