The bioBakery help forum

Why the "class" and "LDA effect size" column for some taxa are empty while alpha value set to 1.0 for both tests?

Hi @sma
Driven by curiosity I just set alpha value for both Kruskal-WWallis and Wilcoxon as 1 . So, it is expected that it will not filter out any taxa and show values for all the columns of all the taxa. Right?
But, in my case, many of the cells from 2nd and 3rd column of lefse result table remains empty like this:

Enterorhabdus_caecimuris 1.97435819417 0.811767501647
Gordonibacter_pamelaeae 3.10341285048 test 2.6615895877 0.151062463578
Eubacterium_sp_OM08_24 2.36933955614 0.9156427106
Firmicutes_bacterium_CAG_534 3.58148983592 control 3.22137310817 0.537326329302
Streptococcus_anginosus_group 1.60994997838 0.00868114589501
Pseudoflavonifractor_sp_An184 1.24886471058 0.906421461969
Actinobaculum_sp_oral_taxon_183 1.63890933652 0.665884752633
Collinsella_stercoris 2.68980647426 0.435812723238

Can you please tell me why?

One more query. For the lefse analysis should I remove very low abundance data from MetaPhlAn output (considering sequence artifact)? Is it necessary?

Hi -

Those two columns, class with the highest mean and the logarithmic LDA score, are only recorded if the feature passes the LDA score threshold. Setting this to zero gave me fully recorded columns.

I don’t think removing low abundance taxa is important for LEfSe. LEfSe relies on LDA score filtering, etc. to control for multiple comparisons, not through p-value adjustment, so having non-meaningful features shouldn’t impact your power. That’s talking purely from the method’s design. In general though, I think it’s probably always good practice to QC your input.

Best,
Siyuan

1 Like

One follow up query. If I have total 70 samples (35 control + 35 test ) and one taxa is present only in 3 samples, in that case LEfSe is also giving output. Should I also consider that in my study?

Hi,

For most of my studies, I set an abundance and a prevalence filter. As a starting point, I normally require that a feature is present in at least 10% of the samples. I hope this helps!

Best,
Kelsey

1 Like