How can I get lefse output for a single taxonomy level (e.g. species) only?

Hi community!!! @sma @franzosa @sagunmaharjann
I am using MetaPhlAn output data as input for lefse in Galaxy. But, I am seeing that a single output graph shows various taxa level within it. This is my output. But I want only a single taxa level (e.g. only species) to be shown in a single plot. How can I get that? Am I doing anything wrong?

Thanks,
dc

Hi dc -

LEfSe automatically goes up the taxonomy to test for features at all levels. Given your input is MetaPhlAn (so should have features at all taxonomy levels to begin with?), here is what Iā€™d suggest:

  1. First, filter your MetaPhlAn table so that it only contains species level features.
  2. ā€œTrickā€ LEfSe to not test features at all taxonomy levels. You can achieve this by manipulating the species names. You can either remove all the other taxa levels preceding the species name, e.g, ā€œk__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Corynebacteriaceae|g__Corynebacterium|s__Corynebacterium_matruchotiiā€ -> ā€œs__Corynebacterium_matruchotiiā€. Alternatively, replace the taxon separator ā€œ|ā€ with something LEfSe wonā€™t recognize, such as ā€œ;ā€, e.g., -> ā€œk__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Corynebacteriaceae;g__Corynebacterium;s__Corynebacterium_matruchotiiā€. The latter might be simpler to do, but gives you less pretty feature names (the full name will appear instead of just s__Corynebacterium_matruchotii).

Let me know if youā€™d need help on any of this.
Best,
Siyuan

1 Like

Hi @sma I have already tested the first option. It did not work. I have seen that the first step (ā€œformat data for lefseā€) was successfully completed. But, the second step (ā€œLDA Effect sizeā€) was failed.
Thanks,
dc

Hi dc-
You can also run something like:
for MetaPhlAn3 output on the command line

$  grep -E "s__|clade" merged_abundance_table.txt | sed 's/^.*s__//g' | cut -f1,3-8 | sed -e 's/clade_name/body_site/g' > merged_abundance_table_species.txt

or
For MetaPhlAn2 output

$ grep -E "(s__)|(^ID)" merged_abundance_table.txt | grep -v "t__" | sed 's/^.*s__//g' > merged_abundance_table_species.txt

To produce feature tables that only have the species level results. Both commands are from the MetaPhlAn tutorials: https://github.com/biobakery/biobakery/wiki/metaphlan2; https://github.com/biobakery/biobakery/wiki/metaphlan3 - this will give you more information on what exactly those commands are doing.

1 Like

Hi @Kelsey_Thompson !!! This is the command I have used to generate species level merged abundance file:
grep -E ā€œ(s__)|(^clade_name)ā€ merged_abundance_table.txt | grep -v ā€œt__ā€ | sed ā€˜s/^.*s__//gā€™ > merge_abundance_species.txt

here is a reproducible part from my data:
|type|highbp|control|control|
|---|---|---|---|
|clade_name|ERR185652_profile|ERR298068_profile|ERR321205_profile|
|Actinobaculum_sp_oral_taxon_183|0|0|0|
|Actinomyces_graevenitzii|0|0|0|
|Actinomyces_naeslundii|0|0|0|
|Actinomyces_odontolyticus|0|0|0.00341|
|Actinomyces_oris|0|0|0|
|Actinomyces_sp_HMSC035G02|0|0|0|
|Actinomyces_sp_HPA0247|0|0|0|
|Actinomyces_sp_ICM47|0|0|0.0042|
|Actinomyces_sp_S6_Spd3|0|0|0|
|Actinomyces_sp_oral_taxon_181|0|0|0|
|Actinomyces_sp_oral_taxon_414|0|0|0|
|Actinomyces_turicensis|0|0|0|
|Varibaculum_cambriense|0|0|0|
|Aeriscardovia_aeriphila|0.00454|0|0|
|Alloscardovia_omnicolens|0|0|0|
|Bifidobacterium_adolescentis|0.26989|0.21046|0|
|Bifidobacterium_animalis|0|0.03512|0|
|Bifidobacterium_bifidum|0|0.07668|0|
|Bifidobacterium_breve|0|0|0|
|Bifidobacterium_catenulatum|0|0.06641|0|
|Bifidobacterium_dentium|0|0|0|
|Bifidobacterium_longum|2.46071|1.68152|0|
|Bifidobacterium_pseudocatenulatum|0|0|0|
|Bifidobacterium_pullorum|0|0|0|
|Gardnerella_vaginalis|0|0|0|

Thanks,
dc

Hi @sma I also have replaced ā€œ|ā€ with ā€œ;ā€. But, it didnā€™t help. Even the first step is failed in this case. Please tell me what to do now?

Thanks,
dc

Hi -

Would you mind sharing your input table with us? Itā€™s a bit difficult to see what the issue might be without the file. You can reach me at siyuanma@g.harvard.edu to share the file.

Best,
Siyuan

1 Like

Hi @sma I have sent the data to your mail. Please have a look at it. My mail Id is deepchanda771992@gmail.com

Thanks,
dc