Pre-downloading PhyloPhlAn databases?

Hello

I know that by running phylophlan with the “-d phylophlan” option it can go and get that database automatically (see Home · biobakery/phylophlan Wiki · GitHub) but I am on a cluster and want to submit lots of jobs, and I don’t want them all attempting to download it simultaneously.

So is there a way to pre-download the phylophlan marker database?

I have read the database setup section (Home · biobakery/phylophlan Wiki · GitHub) and can’t really figure out how to do it…

Cheers
Mick

Hello Mick,

If you already have downloaded the phylophlan_databases.txt file (otherwise you can get it from here: http://cmprod1.cibio.unitn.it/databases/PhyloPhlAn/phylophlan_databases.txt), you can open it and see that there are two URLs for the phylophlan database:

http://cmprod1.cibio.unitn.it/databases/PhyloPhlAn/phylophlan.tar
http://cmprod1.cibio.unitn.it/databases/PhyloPhlAn/phylophlan.md5

You should download them both and save let’s say in /databases (that is accessible also by the nodes of your cluster).

Now, I suggest you run one PhyloPhlAn job that will take care of decompressing and indexing the database according to the configuration file (I’m assuming all jobs will use the same configuration for the sections: [db_aa], [map_dna], and [map_aa]). Once that’s done you can then submit all your other jobs that will simply found the database already available and ready to use.

To do this you should just specify the database location with the --databases_folder parameter (in the above case: --databases_folder /databases).

Thanks also for checking the wiki, we’ll update it to include this case, so that will be useful also for others in the future.

Many thanks,
Francesco