Unable to run example 1

Hello!

I’ve installed phylophlan by running the command: conda create -n “phylophlan” -c bioconda phylophlan=3.0.

I then ran: ‘conda activate phylophlan’ and ‘phylophlan_write_default_configs.sh [output_folder]’.

This all works and when I test the installation I get: PhyloPhlAn version 3.0.60 (27 November 2020)

Then, I tried running the first example of the manual by just running ‘sh run_01.sh’.

This works up until Step 4 where I get a lot of errors that I don’t understand well.

First, I get: [e] “/opt/anaconda3/envs/phylophlan/lib/python3.10/site-packages/phylophlan/phylophlan_configs/” folder does not exists

and then: [e] program not installed or not present in the system path
command_line: FastTreeMP

I checked in the forum and saw that people also got the first error but understood that this should be corrected in the latest conda version?

I also tried running just the piece of code that is in the Step 4 of the tutorial. Then, I get the same first error but then I get a different second error:

[e] Command ‘[’/opt/anaconda3/envs/phylophlan/bin/diamond’, ‘makedb’, ‘–threads’, ‘1’, ‘–in’, ‘/quispe/Downloads/phylophlan-master/phylophlan/examples/01_saureus/s__Staphylococcus_aureus/s__Staphylococcus_aureus.faa’, ‘–db’, ‘/quispe/Downloads/phylophlan-master/phylophlan/examples/01_saureus/s__Staphylococcus_aureus/s__Staphylococcus_aureus’]’ returned non-zero exit status 1.

I tried adding an empty ‘phylophlan_configs’ folder to ‘…/site-packages/phylophlan’ and the first error message goes away but I’m still stuck with the other problems.

I’ve spent all day trying to make this work to try it on my data so any input you could give me would be very appreciated!

Thanks
Paula

Dear Paula,

Many thanks for trying PhyloPhlAn and reporting this.

So, as you have already read around the [e] “/opt/anaconda3/envs/phylophlan/lib/python3.10/site-packages/phylophlan/phylophlan_configs/” folder does not exists is actually not an error, more of a warning. The message will be updated with the next release to be less confusing.

The second error you got:

[e] program not installed or not present in the system path
command_line: FastTreeMP

is strange. Because the FastTreeMP executable was found when writing the config file, but it appears that PhyloPhlAn is not able to find it.
I think this is an issue with the conda environment, because when you execute sh run_01.sh the phylophlan conda env you have activated is not accessible to the run_01.sh script. So, if that’s the case, you should add at the beginning of the run_01.sh script two lines to import your conda installation and to activate the env you need. Something like:

. "/opt/anaconda3/etc/profile.d/conda.sh"
conda activate phylophlan;

Now, to better understand your problem and if the above suggestion can be of help, when you say:

With “Step 4” you mean the command that starts at line 48 that is for building the phylogeny of the 1,135 S. aureus genomes? If yes, then I’m wondering how the building of the phylogeny of just the S. aureus isolates was able to run.

Please let me know if something is not clear.

Many thanks,
Francesco

Dear Francesco,

Thanks so much for the prompt answer!

I’ve tried activating the environment in the bash script using ‘source /opt/anaconda3/bin/activate phylophlan’. Then, I get:

[e] Command ‘[’/opt/anaconda3/envs/phylophlan/bin/diamond’, ‘makedb’, ‘–threads’, ‘1’, ‘–in’, ‘/quispe/Downloads/phylophlan-master/phylophlan/examples/01_saureus/s__Staphylococcus_aureus/s__Staphylococcus_aureus.faa’, ‘–db’, ‘/quispe/Downloads/phylophlan-master/phylophlan/examples/01_saureus/s__Staphylococcus_aureus/s__Staphylococcus_aureus’]’ returned non-zero exit status 1.

[e] cannot execute command
command_line: /opt/anaconda3/bin/diamond makedb --threads 1 --in /quispe/Downloads/phylophlan-master/phylophlan/examples/01_saureus/s__Staphylococcus_aureus/s__Staphylococcus_aureus.faa --db /quispe/Downloads/phylophlan-master/phylophlan/examples/01_saureus/s__Staphylococcus_aureus/s__Staphylococcus_aureus

Sorry if it wasn’t clear from the previous message but I get these errors when the program tries to build the very first phylogeny. So, I’m able to download all the isolate genomes and create the database and config file but when it gets to actually creating the phylogeny (which is named Step 4 in the tutorial) it fails. In the run_01.sh script this corresponds to line 24.

The other error ‘[e] program not installed or not present in the system path
command_line: FastTreeMP’ appears when trying to build the phylogeny for the 1,135 isolates (which is line 48).

As I said I also tried running this directly in the command line (where I can simply activate the phylophlan environment) and when trying to build the first phylogeny got the same two error messages I described above.

Once more thanks for your help!
Paula

Hi Paula,

The diamond error you reported now is using a different diamond executable ( /opt/anaconda3/bin/diamond instead of /opt/anaconda3/envs/phylophlan/bin/diamond from your previous message).

Can you check the content of the database folder /quispe/Downloads/phylophlan-master/phylophlan/examples/01_saureus/s__Staphylococcus_aureus/ because it could be that the error is due to different versions of diamond being used for indexing it and then running the mapping?
Inside the database’s folder, you should have a file named s__Staphylococcus_aureus.dmnd, if there is you can remove it. If you don’t find it, you can try manually running the diamond command (using diamond from the phylophlan conda env:

/opt/anaconda3/envs/phylophlan/bin/diamond makedb --threads 1 --in /quispe/Downloads/phylophlan-master/phylophlan/examples/01_saureus/s__Staphylococcus_aureus/s__Staphylococcus_aureus.faa --db /quispe/Downloads/phylophlan-master/phylophlan/examples/01_saureus/s__Staphylococcus_aureus/s__Staphylococcus_aureus

Also, can you check the content of your config file? (Feel free to share it here as well if you like me to give it a look.)

Many thanks,
Francesco

Hi Francesco,

I didn’t have that file ‘s__Staphylococcus_aureus.dmnd’ but after running the command you suggested the file was created.

I then tried again running the piece of code for creating the first phylogeny (in line 24) but then got this error now:

[e] Command ‘[’/opt/anaconda3/bin/diamond’, ‘blastx’, ‘–quiet’, ‘–threads’, ‘1’, ‘–outfmt’, ‘6’, ‘–more-sensitive’, ‘–id’, ‘50’, ‘–max-hsps’, ‘35’, ‘-k’, ‘0’, ‘–query-gencode’, ‘11’, ‘–query’, ‘output_isolates/tmp/clean_dna/GCA_003240325.fna’, ‘–db’, ‘/quispe/Downloads/phylophlan-master/phylophlan/examples/01_saureus/s__Staphylococcus_aureus/s__Staphylococcus_aureus.dmnd’, ‘–out’, ‘output_isolates/tmp/map_dna/GCA_003240325.b6o.bkp’]’ returned non-zero exit status 1.

This appears to happen for multiple GCA files and then I also get several: [e] error while mapping until it says [e] gene_markers_identification crashed.

This is the content of my config file:
[db_aa]
program_name = /opt/anaconda3/bin/diamond
params = makedb
threads = --threads
input = --in
output = --db
version = version
command_line = #program_name# #params# #threads# #input# #output#

[map_dna]
program_name = /opt/anaconda3/bin/diamond
params = blastx --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0 --query-gencode 11
input = --query
database = --db
output = --out
version = version
command_line = #program_name# #params# #input# #database# #output#

[map_aa]
program_name = /opt/anaconda3/bin/diamond
params = blastp --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0
input = --query
database = --db
output = --out
version = version
command_line = #program_name# #params# #input# #database# #output#

[msa]
program_name = /opt/anaconda3/bin/mafft
params = --quiet --anysymbol --thread 1 --auto
version = --version
command_line = #program_name# #params# #input# > #output#

[trim]
program_name = /opt/anaconda3/bin/trimal
params = -gappyout
input = -in
output = -out
version = --version
command_line = #program_name# #params# #input# #output#

[tree1]
program_name = /opt/anaconda3/bin/fasttree
params = -quiet -pseudo -spr 4 -mlacc 2 -slownni -fastest -no2nd -mlnni 4 -gtr -nt
output = -out
command_line = #program_name# #params# #output# #input#

[tree2]
program_name = /opt/anaconda3/bin/raxmlHPC-PTHREADS-SSE3
params = -p 1989 -m GTRCAT
database = -t
input = -s
output_path = -w
output = -n
version = -v
command_line = #program_name# #params# #threads# #database# #output_path# #input# #output#
threads = -T

Thanks!
Paula

Thanks Paula!

The error:

Could be due to different diamond versions present in the base and phylophlan conda envs. Since you now created the s__Staphylococcus_aureus.dmnd file using diamond from the phylophlan conda env. In the config file, the program_name entries always point to the executables in /opt/anaconda3/bin and not those in the phylophlan env (/opt/anaconda3/envs/phylophlan/bin).
So, I would suggest you re-create the config file with the phylophlan env active when you run the phylophlan_write_config_file.py script.

The command to re-create the config file is:

phylophlan_write_config_file -o supermatrix_aa.cfg \
    -d a \
    --db_aa diamond \
    --map_dna diamond \
    --map_aa diamond \
    --msa mafft \
    --trim trimal \
    --tree1 fasttree \
    --tree2 raxml \
    --overwrite \
    --verbose

Please, let me know if you can get it working.

Many thanks,
Francesco