Format_input.py parameters: how do they affect the pipeline and the plots?

PietroGx · May 6, 2020, 6:55pm

Hello everyone,

I am trying to analyse some data with LEfSe on anaconda (python).
I successfully execute the script with the example data

.

Is the cladogram supposed to look like this? If so, how can I change the colors for the biomarkers? If not, I believe it is because no biomarker clade was found (I could not generate the other picture (step 3, see below)).

The code I run is the following:

bin/format_input.py tmp/sample.txt tmp/merged_abundance_table.lefse
bin/run_lefse.py tmp/merged_abundance_table.lefse tmp/merged_abundance_table.lefse.out -l 4
bin/plot_res.py --dpi 300 tmp/merged_abundance_table.lefse.out output_images/lefse_biomarkers.png
bin/plot_res.py --dpi 300 tmp/merged_abundance_table.lefse.out tmp/lefse_biomarkers.png

I am not sure about the format_input.py parameters (-c,-s,-u,-o) what do they do? Online I could not find any info and the code where they are used is not commented so, before trying to back-engineer everything I am glad to ask.

Cheers,

Pietro

Kelsey_Thompson · May 14, 2020, 11:15am

Hi Pietro,

Thanks for the question! The cladogram you produced does look correct if there were no significant features to plot on it.

As for the parameters:
-c = row of the data to use as the class
-s = row that contains the subclass information
-u = the row with the subject information
and
-o = the normalization (for LEfSe the default is [1.0] or none)

I have attached the help page for the format LEfSe step. I hope this helps. Let us know if we can do anything else.

Best,
Kelsey

Screen Shot 2020-05-14 at 7.01.42 AM

lixiaopi1985 · June 12, 2020, 5:31am

Hi,

I just began to use this software. So in this post, sma says the normalization is done internally to get relative abundance. Does that mean if I don’t set -o parameter, the program would do it still automatically or do I have to set it? Additional question, in the tutorial, the -o set to 1000000, in the galaxy, kinda explained it a bit (Per-sample normalization of the sum of the values to 1M), but I am doing amplicon research, I am not sure what does that actually do to normalize your sample?

Thank you!

sma · June 12, 2020, 10:36pm

Hi -

I’m not sure what the default normalization factor is, so it’s probably safe to set your own.

Fortunately, I think due to the nature of logarithm, LEfSe results shouldn’t be sensitive to -o choices. The intuition is log(a * 1000) - log(b * 1000) = log(a * 1e6) - log(a *1e6), so normalizing things to 1M should give you the same p-values as to 1000. Now, there will be interpretation differences on the effect sizes (for example, might be easier to think of things on the CPM scale hence the 1M default). I’m not sure what a good choice is for amplicon either.

The bottom line is, I’d suggest you try a few different -o values that make sense to you, and pick one that’s easier to interpret. I’d also see if they report very different p-values - if they do I’d be worried and please let us know.

Thanks,
Siyuan

Topic		Replies	Views
LEfSe - What are the parameters at the end of lefse_format_input.py LEfSe	1	620	September 20, 2022
Issues with cladogram.py function LEfSe	10	1046	June 9, 2023
Format data for Lefse step error LEfSe	3	431	October 3, 2023
Lefse galaxy and conda codes LEfSe	2	654	November 30, 2023
Problme with LEfSe using command line interface LEfSe	5	2232	May 25, 2023

Format_input.py parameters: how do they affect the pipeline and the plots?

Related topics