Strainphlan question on get the strain info

Hi~, i am really new to use Strainphlan and Metaphlan. I have several general questions to help me really understand the whole pipelline.

  1. i have run MetaPhlan successfully with the codes as: metaphlan path/to/reads/s1.forward.fastq.gz,path/to/reads/s1.reverse_reads.fastq.gz -o s1.tsv --input_type fastq --bowtie2out s1.bowtie2.bz2 -s s1.sam.bz2

but when i read others’ posts, i noticed that some are using different databases from chocophlan database with --bowtie2db; here I did not specify the database, will MetaPhlan uses default db?
2) After i get the species info from metaphlan, i want to use Strainphlan and get strain info. I followed the StrainPhlAn3 · biobakery/biobakery Wiki · GitHub
in step 3: ```
extract_markers.py -c s__Eubacterium_rectale -o clade_markers

I am wondering whether the  -c s__Eubacterium_rectale means that we can only get the strain info for this only one s__Eubacterium_rectale species?? If so, how can I get all the possible strains info from all species in my file of metaphlan output?

3) in the same step 3, ```
strainphlan -s consensus_markers/*.pkl -m clade_markers/s__Eubacterium_rectale.fna -r reference_genomes/*.fna -o output -n 8 -c s__Eubacterium_rectale --phylophlan_mode fast --nproc 4

it also specifies the -c s__Eubacterium_rectale and it needs the -r reference_genomes/*.fna; do i need to curate the my own reference-genomes? or can i use the cholophlan database ?

Thank you so much

1 Like

I have the same question, I have a new specie (Unculture),
this species is not in the mpa database。How can I extract the markers of my strain?

Hi @molly
Thanks for getting in contact.
When MetaPhlAn runs without specifying the database, it runs, by default, with the last version, i.e. mpa_v30_CHOCOPhlAn_201901 (this v30 database is described in our last preprint: https://www.biorxiv.org/content/10.1101/2020.11.19.388223v1).
For running StrainPhlAn, the addition of reference genomes is not mandatory, you can run it directly on the PKL files generated by the sample2markers.py script.
For the step 3 (extract_markers.py -c s__Eubacterium_rectale -o clade_markers) s__Eubacterium_rectale is just an example, you can run it on any species present in your samples. To know which species StrainPhlAn is able to profile in your samples you can use the --print_clades_only parameter.
E.g. strainphlan -s consensus_markers/*.pkl -o output --print_clades_only

Hi @CK_zhu
If you have a new species that is not present in the mpa database, neither MetaPhlAn nor StrainPhlAn will be able to profile it.