Various PhyloPhlAn 3.0 Errors for Nucleotide MAGs and Genomes

I am trying to generate a phylogenetic tree using 20 MAGs and 153 genomes all located in a directory titled “all_fna_files” (all files have the .fna extension).

I have downloaded PhyloPhlAn version 3.0.60 (27 November 2020).

I am having an issue generating the tree. The amphora2 database was not being generated, so I performed the following.
Downloaded these files into phylophlan_databases:

cd /Users/brandifeehan/Documents/KSU_PhD/Lee_Lab/KSU/Stages_MAGs/MAGs/Archaea/PhyloPhlAn/phylophlan_databases
tar -xf amphora2.tar
bzcat amphora2/*.bz2 > amphora2/amphora2.faa
tar -xf phylophlan.tar
bunzip2 -k phylophlan/phylophlan.faa.bz2
diamond makedb --in amphora2/amphora2.faa --db amphora2/amphora2
diamond makedb --in phylophlan/phylophlan.faa --db phylophlan/phylophlan

I then submitted the following command and received the following error. I am uncertain of how to proceed.

(phylophlan) brandifeehan@ip-10-150-7-20 PhyloPhlAn % phylophlan -i all_fna_files/
-d amphora2
–diversity low
-f supermatrix_nt.cfg
-t n

Generating “db_dna” indexed database “amphora2”

[e] Command ‘[’/Users/brandifeehan/anaconda3_new/bin/makeblastdb’, ‘-parse_seqids’, ‘-dbtype’, ‘nucl’, ‘-in’, ‘/Users/brandifeehan/anaconda3_new/envs/phylophlan/lib/python3.9/site-packages/phylophlan/phylophlan_databases/amphora2/amphora2.fna’, ‘-out’, ‘/Users/brandifeehan/anaconda3_new/envs/phylophlan/lib/python3.9/site-packages/phylophlan/phylophlan_databases/amphora2/amphora2’]’ returned non-zero exit status 1.

[e] cannot execute command
command_line: /Users/brandifeehan/anaconda3_new/bin/makeblastdb -parse_seqids -dbtype nucl -in /Users/brandifeehan/anaconda3_new/envs/phylophlan/lib/python3.9/site-packages/phylophlan/phylophlan_databases/amphora2/amphora2.fna -out /Users/brandifeehan/anaconda3_new/envs/phylophlan/lib/python3.9/site-packages/phylophlan/phylophlan_databases/amphora2/amphora2
stdin: None
stdout: None
env: {‘TERM_PROGRAM’: ‘’, ‘TERM’: ‘xterm-256color’, ‘SHELL’: ‘/bin/zsh’, ‘TMPDIR’: ‘/var/folders/gm/dckmrbwj3hj27cj2jwnh7q8w0000gn/T/’, ‘CONDA_SHLVL’: ‘2’, ‘CONDA_PROMPT_MODIFIER’: '(phylophlan) ', ‘TERM_PROGRAM_VERSION’: ‘3.4.15’, ‘TERM_SESSION_ID’: ‘w0t4p0:0A549096-86FC-46EE-B034-8DDA632F4F06’, ‘USER’: ‘brandifeehan’, ‘COMMAND_MODE’: ‘unix2003’, ‘CONDA_EXE’: ‘/Users/brandifeehan/anaconda3_new/bin/conda’, ‘SSH_AUTH_SOCK’: ‘/private/tmp/’, ‘__CF_USER_TEXT_ENCODING’: ‘0x1F5:0x0:0x0’, ‘CE_CONDA’: ‘’, ‘CONDA_PREFIX_1’: ‘/Users/brandifeehan/anaconda3_new’, ‘PATH’: ‘/Users/brandifeehan/anaconda3_new/envs/phylophlan/bin:/Users/brandifeehan/anaconda3_new/condabin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin’, '’: ‘/Users/brandifeehan/anaconda3_new/envs/phylophlan/bin/phylophlan’, ‘CONDA_PREFIX’: ‘/Users/brandifeehan/anaconda3_new/envs/phylophlan’, ‘__CFBundleIdentifier’: ‘com.googlecode.iterm2’, ‘PWD’: ‘/Users/brandifeehan/Documents/KSU_PhD/Lee_Lab/KSU/Stages_MAGs/MAGs/Archaea/PhyloPhlAn’, ‘LANG’: ‘en_US.UTF-8’, ‘ITERM_PROFILE’: ‘Default’, ‘XPC_FLAGS’: ‘0x0’, ‘_CE_M’: ‘’, ‘XPC_SERVICE_NAME’: ‘0’, ‘SHLVL’: ‘1’, ‘HOME’: ‘/Users/brandifeehan’, ‘COLORFGBG’: ‘7;0’, ‘LC_TERMINAL_VERSION’: ‘3.4.15’, ‘ITERM_SESSION_ID’: ‘w0t4p0:0A549096-86FC-46EE-B034-8DDA632F4F06’, ‘CONDA_PYTHON_EXE’: ‘/Users/brandifeehan/anaconda3_new/bin/python’, ‘LOGNAME’: ‘brandifeehan’, ‘CONDA_DEFAULT_ENV’: ‘phylophlan’, ‘LC_TERMINAL’: ‘iTerm2’, ‘DISPLAY’: ‘/private/tmp/’, ‘COLORTERM’: ‘truecolor’}

Hi, first of all, thanks for trying PhyloPhlAn.

The error you reported is due to the wrong configuration file specified in the PhyloPhlAn command: -f supermatrix_nt.cfg. The supermatrix_nt.cfg is designed to work with a database of genes (hence the _nt suffix), while both phylophlan and amphora2 databases are proteins. I think that changing the configuration file to supermatrix_aa.cfg should fix it.

The manual database extraction you did:

Should actually be not needed, as PhyloPhlAn will take care of it. Are you sure PhyloPhlAn is looking g at the correct databases folder? You can provided the path /Users/brandifeehan/Documents/KSU_PhD/Lee_Lab/KSU/Stages_MAGs/MAGs/Archaea/PhyloPhlAn/phylophlan_databases with the --databases_folder param.

Many thanks,


Thanks for your help!

I was under the impression the database would be generated after sending the command, but this is the error I was receiving previously (and now as well). This is why I was trying to download the databases manually.

phylophlan -i all_fna_files/ -d amphora2 -f supermatrix_aa.cfg -t n --diversity low

[w] cannot create database “amphora2”, section “db_dna” not present in configurations
[e] both db_dna and db_aa are None!


Dear Brandi,

Sorry I forgot to say that also the -t n param you specify for PhyloPhlAn is not correct, again that is to skip the automatic detection of the database type, but you’re specifying the database to be of genes. So, you should instead specify -t a.


Please check your configuration file and make sure that the db_aa section is present (that’s what you need for indexing a database of proteins like phylophlan and amphora2.
Then you should make sure you have the map_dna section as well as your inputs are genomes and MAGs (and not proteomes, if I understood correctly, in which case you would need also the map_aa section).

Finally, if you generate the configuration file with the --force_nucleotides param, then make sure to specify it also in the PhyloPhlAn command.

Please, let me know if something is not clear or not working.

Thanks, Francesco