I want to run strainphlan4 on my metagenomics data to get the strains of the “s__Akkermansia_muciniphila”. However, following the tutorial for strainohlan4 pipeline, the clades are specified in SGB format. So I cannot pass “s__Akkermansia_muciniphila” as the option of the extract_marker step:
extract_markers.py -c s__Akkermansia_muciniphila
or at the strainphlan step:
strainphlan.py -c s__Akkermansia_muciniphila
Instead of the “s__Akkermansia_muciniphila”, I have to pass “SGB9228” or “SGB9228” which I found in the " mpa_vJan21_CHOCOPhlAnSGB_202103_species.txt.bz2 file corresponding to the “s__Akkermansia_muciniphila” species. I am wondering how am I supposed to use both of them in my strainphlan and excract_marker run to get all the strains of the “s__Akkermansia_muciniphila”.
Hi @raminka
As you can see in the strainphlan tutorial (StrainPhlAn 4 · biobakery/MetaPhlAn Wiki · GitHub) in version 4, the clades (-c
) should be specified at the SGB level (as some species might span multiple SGBs). in your case, you should pass -c t__SGB9228
in both extract_markers and strainphlan scripts
and how it should be done to know which SGB corresponds to each clade?
Hi @imontero
You can get them from the MetaPhlAn profile. You will have this file because you have to run MetaPhlAn in order to have the SAM files used as input for sample2markers in StrainPhlAn.
Metaphlan profile file gives Kingdom, Phylum,…Family… Species, but does not give SGB code to, for example s__Bifidobacterium_longum.
I could find it downloading a fileform database that lists all taxa and all SGB code, but no one of the output files from mMtaphlan helped.
Maybe you’re running Metaphlan 3? Make sure you use Metaphlan 4 with Strainphlan 4.