Question regarding subclasses that span over classes

Hello community, this is my first question regarding LEfSe.

I am currently analyzing amplicon sequencing data using LEfSe, and I want to compare relative abundances of taxa between disease state and non-disease state, which I defined as class (in this thread I call it disease1 and non-disease1). Under each class there are subdivisions of classes, which we called disease2 (and non-disease2).

So the input data looks like:

Class: disease1 disease1 disease1 non-disease1 non-disease1 non-disease1
Subclass: disease2 disease2 non-disease2 disease2 disease2 non-disease2

I read the thread below, and found that in such a case we should not use disease2 as subclass, as it is not subdivision of single class (disease2 category span both disease1 and non-disease1).

https://groups.google.com/forum/#!searchin/lefse-users/subclass|sort:date/lefse-users/wSnLGkaAa9I/He4SpNetBQAJ

My question is that when “disease2 under disease1” is thought to be different condition from “disease2 under non-disease1”, is this still recommended not to use disease2 as subclass?

I am a bit confused about this particular circumstance, and any help would be appreciated like paper that using subclasses which is not a subdivision of single class. Thanks in advance.

Sincerely,

I’m not sure I completely agree with the discussion in the google groups post you linked. I believe it can be appropriate to use subclass in your situation depending on how many samples you have in each subclass. In the original paper they state several things that I think can be applied to your example.

If you used the input data you indicated, there would be two comparisons you could make.

  1. Are there features that are more abundant in disease1-disease2 samples compared to non-disease1-disease2 samples AND ALSO more abundant in disease1-nondisease2 samples compared to non-disease1-nondisease2 samples?

This essentially controls for subclass by doing pairwise comparisons only between the same subclasses and reporting features that were significant under both comparisons…

  1. Are there features that are more abundant in disease1-disease2 samples compared to both non-disease1-disease2 AND non-disease1-nondisease2 samples AND ALSO more abundant in disease1-nondisease2 samples than in both non-disease1-disease2 AND non-disease1-nondisease2 samples?

I can’t think of a situation where this comparison would be specifically informative, but it would provide more strength to the differences between the two classes.

I say this with the disclaimer that I am not an expert on this tool and my attempt at an explanation comes solely from my experience and interpretation of how the tool can be used, but I would love to hear back from other community members.

Thank you very much for your answer, jmgreenb!
I agree with your thought on the comparison number one.