Strainphlan question on get the strain info

molly · March 5, 2021, 1:52am

Hi~, i am really new to use Strainphlan and Metaphlan. I have several general questions to help me really understand the whole pipelline.

i have run MetaPhlan successfully with the codes as: metaphlan path/to/reads/s1.forward.fastq.gz,path/to/reads/s1.reverse_reads.fastq.gz -o s1.tsv --input_type fastq --bowtie2out s1.bowtie2.bz2 -s s1.sam.bz2

but when i read others’ posts, i noticed that some are using different databases from chocophlan database with --bowtie2db; here I did not specify the database, will MetaPhlan uses default db?
2) After i get the species info from metaphlan, i want to use Strainphlan and get strain info. I followed the StrainPhlAn3 · biobakery/biobakery Wiki · GitHub
in step 3: ```
extract_markers.py -c s__Eubacterium_rectale -o clade_markers

I am wondering whether the  -c s__Eubacterium_rectale means that we can only get the strain info for this only one s__Eubacterium_rectale species?? If so, how can I get all the possible strains info from all species in my file of metaphlan output?

3) in the same step 3, ```
strainphlan -s consensus_markers/*.pkl -m clade_markers/s__Eubacterium_rectale.fna -r reference_genomes/*.fna -o output -n 8 -c s__Eubacterium_rectale --phylophlan_mode fast --nproc 4

it also specifies the -c s__Eubacterium_rectale and it needs the -r reference_genomes/*.fna; do i need to curate the my own reference-genomes? or can i use the cholophlan database ?

Thank you so much

CK_zhu · March 6, 2021, 3:39am

I have the same question, I have a new specie （Unculture）,
this species is not in the mpa database。How can I extract the markers of my strain?

aitor.blancomiguez · March 8, 2021, 8:19am

Hi @molly
Thanks for getting in contact.
When MetaPhlAn runs without specifying the database, it runs, by default, with the last version, i.e. mpa_v30_CHOCOPhlAn_201901 (this v30 database is described in our last preprint: https://www.biorxiv.org/content/10.1101/2020.11.19.388223v1).
For running StrainPhlAn, the addition of reference genomes is not mandatory, you can run it directly on the PKL files generated by the sample2markers.py script.
For the step 3 (extract_markers.py -c s__Eubacterium_rectale -o clade_markers) s__Eubacterium_rectale is just an example, you can run it on any species present in your samples. To know which species StrainPhlAn is able to profile in your samples you can use the --print_clades_only parameter.
E.g. strainphlan -s consensus_markers/*.pkl -o output --print_clades_only

aitor.blancomiguez · March 8, 2021, 8:21am

Hi @CK_zhu
If you have a new species that is not present in the mpa database, neither MetaPhlAn nor StrainPhlAn will be able to profile it.

Topic		Replies	Views
Where can i get metaphlan_databases information？ StrainPhlAn	1	409	October 26, 2021
QUESTION: StrainPhlAn working with a genome? StrainPhlAn	3	967	January 18, 2021
StrainPhlan 4 tutorial issues StrainPhlAn	3	634	March 22, 2024
Checking for available genomes in MetaPhlAn Database MetaPhlAn	5	796	July 5, 2022
Problem extracting the species marker genes from metaphlan4 database StrainPhlAn	1	381	May 9, 2023

Strainphlan question on get the strain info

Related topics