Phylophlan - creating database of markers

Ana · August 9, 2021, 4:02pm

Hello. I am trying to create a set of core proteins as a database instead of using the UniRef90 - the species I and studying is not there. I’ve already used the default phylophlan database, but I want to make my own. How can I do this?

f.asnicar · August 10, 2021, 12:50pm

Hello and thanks for using PhyloPhlAn!

To build your own database you can try following the instructions available here: PhyloPhlAn wiki - Database setup.
Basically, you’ll still need to use the phylophlan_setup_database script, but providing your own file (or folders with the gene files) instead of the automatic download of UniRfe90.

Please let me know if something is not clear.

Many thanks,
Francesco

Ana · August 10, 2021, 2:11pm

Hi! I meant how can I get a set of genes that are markers to put in the database.

f.asnicar · August 12, 2021, 2:21pm

Hi, to do that you need to use tools like prokka and roary, where the first annotates your genomes and the second computes the set of core proteins from the gene annotations. Then you can build a custom db for PhyloPhlAn using the core genes identified by Roary.

I hope this helps, thanks,
Francesco

Ana · August 15, 2021, 12:08am

Hello Francesco. Yes that helps, I just read this in the paper too.

Right now I am using the default phylophlan database, but would you agree it would make a “better” tree to make a custom db of markers, if looking at a single species?

f.asnicar · August 16, 2021, 8:54am

Hello Ana. Yes, the phylophlan database is a set of 400 universal proteins, so they might not be specific enough to accurately resolve closely related genomes, as in your case.

I don’t know what species you’re studying, but alternatively, to the “prokka+roary” pipeline, one thing you could try is to download the UniRef90 of the species in the same genus as yours, then make a db for PhyloPhlAn, and then set the the --min_num_entries param in PhyloPhlAn to use only those that are found in “enough” genomes (basically this will be a coreness threshold for the markers in the db).

Ana · November 10, 2021, 4:04am

Hi,

I’m trying to use Roary to make a database of markers. It looks like Roary will make an multi-fasta alignment. Can this be used as a database? Do you know how I can just get the sequences and not the alignment? Thanks in advance.

f.asnicar · November 15, 2021, 9:36am

Hi Ana,
from Roary you should also have a folder with all genes identified in the pangenome. What you can do is to get only those that are “core” (and here you can decide which % threshold to use) and put them in a separate folder. At that point, you can run phylophlan_setup_database on that folder to build a database formatted for PhyloPhlAn and then you can run phylophlan specifying your custom database of core genes.

Alternatively, you can take the core_gene_alignment.aln, remove all the gaps (-) added by the MSA and run phylophlan_setup_database on the unaligned multi-fasta file.

I hope this helps and let me know if something doesn’t work.

Many thanks,
Francesco

Ana · November 17, 2021, 11:47pm

Yes, that helps. Thank you.

I have a big problem now. Since I installed Prokka and Roary, Phylophlan is no longer working on my Mac (it was working before I installed these programs). I get an error at the mapping stage using diamond - it said something the database and version are not compatible.

[e] Command ‘[’/Users/MyComputer/miniconda3/envs/phylophlan/bin/diamond’, ‘blastx’, ‘–quiet’, ‘–threads’, ‘1’, ‘–outfmt’, ‘6’, ‘–more-sensitive’, ‘–id’, ‘50’, ‘–max-hsps’, ‘35’, ‘-k’, ‘0’, ‘–query’, ‘phylophlan_output/tmp/clean_dna/VICT1.fasta’, ‘–db’, ‘phylophlan_databases/phylophlan/phylophlan.dmnd’, ‘–out’, ‘phylophlan_output/tmp/map_dna/VICT1.b6o.bkp’]’ returned non-zero exit status 1.

[e] gene_markers_identification crashed

Thank you,
Ana

f.asnicar · November 26, 2021, 1:25pm

Likely the diamond version changed. What you can do is to remove the diamond indexed database (file with .dmnd extension) from the PhyloPhlAn databases folder and re-run PhyloPhlAn. At that point, PhyloPhlAn will re-indexed the database using the new version you installed and everything should work.

Many thanks, Francesco

Topic		Replies	Views
Phylophlan Nucleotide Database for marker genes PhyloPhlAn	0	169	September 18, 2023
Prokaryotic tree reconstruction pipeline issue PhyloPhlAn	5	590	August 9, 2021
Pre-downloading PhyloPhlAn databases? PhyloPhlAn	1	816	January 21, 2021
Phylophlan3 not creating markers for all input genomes PhyloPhlAn	1	311	March 28, 2023
How many markers included in Phylophlan concatenated alignment? PhyloPhlAn	1	831	October 2, 2020

Phylophlan - creating database of markers

Related topics