Hi there!!!
I have used same metaphlan output as input for lefse in two formats. In first case, extracted out all the OTUs upto species level like:
type
clade_name
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_odontolyticus
And, in second case, using only upto genus level from the same metaphlan profile output:
type
clade_name
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces
But, I found both gives different results. I found some genus in the second case that were not found from the first case and vice-versa. Why am I getting this differences in output? Can anyone please help?
Could you check if the result statistics are them same (log LDA score, KW p-values) between the two? If not I’d be worried - it might be miscalculations occurred when LEfSe collapses species level abundance in to genus. If they are the same, then there might be filtering considerations we could tweak to make the plots comparable.
I met a problem when I using the web version of LEfSe. I got different results when I used two formats of taxonomy information, one is only the species level, and the other is 'genus|species' format. The former got more biomarkers than the latter, and the same biomarker appeared in the two practices got different LDA scores and KW p-values.
I wonder have you figured out the reason? And which format will get more accurate results?