The bioBakery help forum

Unable to create the tree of life: IsADirectoryError: [Errno 21] Is a directory: '/home/ubuntu/phylophlan/PhyloPhlAn.egg-info'


I have been trying to follow the instructions to recreate the ToL here: PhyloPhlAn 3.0: Example 02: Tree of life · biobakery/biobakery Wiki · GitHub

I am running PhyloPhlAn from a conda install, carried out yesterday.

This is what I ran:

phylophlan_get_reference -g all -n 1 -o genbank_species

This produced a directory called “genbank_species”, which contains 17,509 .fna.gz files

phylophlan_write_config_file \
    -d a \
    -o 02_tol.cfg \
    --db_aa diamond \
    --map_dna diamond \
    --map_aa diamond \
    --msa mafft \
    --trim trimal \
    --tree1 iqtree


phylophlan \
    -i genbank_species \
    -d phylophlan \
    -f 02_tol.cfg \
    --diversity high \
    --fast \
    -o output_tol \
    --nproc 16

Error is:

PhyloPhlAn version 3.0.60 (27 November 2020)

Command line: /home/ubuntu/miniconda3/envs/phylophlan/bin/phylophlan -i genbank_species -d phylophlan -f 02_tol.cfg --diversity high --fast -o output_tol --nproc 4 --verbose

Automatically setting "database=phylophlan" and "databases_folder=/home/ubuntu"
Automatically setting "input=genbank_species" and "input_folder=/home/ubuntu"
"high-fast" preset
Setting "sort=True" because "database=phylophlan"
Setting "min_num_markers=100" since no value has been specified and the "database=phylophlan"
Arguments: {'input': 'genbank_species', 'clean': None, 'output': 'output_tol', 'database': 'phylophlan', 'db_type': None, 'config_file': '02_tol.cfg', 'diversity': 'high', 'accurate': False, 'fast': True, 'clean_all': False, 'database_list': False, 'submat': 'pfasum60', 'submat_list': False, 'submod_list': False, 'nproc': 4, 'min_num_proteins': 1, 'min_len_protein': 50, 'min_num_markers': 100, 'trim': 'greedy', 'gap_perc_threshold': 0.67, 'not_variant_threshold': 0.9, 'subsample': <function phylophlan at 0x7f074cdc2710>, 'unknown_fraction': 0.3, 'scoring_function': <function trident at 0x7f074cdc2d40>, 'sort': True, 'remove_fragmentary_entries': False, 'fragmentary_threshold': 0.67, 'min_num_entries': 4, 'maas': None, 'remove_only_gaps_entries': False, 'mutation_rates': False, 'force_nucleotides': False, 'input_folder': '/home/ubuntu/genbank_species', 'data_folder': 'output_tol/tmp', 'databases_folder': '/home/ubuntu', 'submat_folder': '/home/ubuntu/miniconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan_substitution_matrices/', 'submod_folder': '/home/ubuntu/miniconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan_substitution_models/', 'configs_folder': 'phylophlan_configs/', 'output_folder': '', 'genome_extension': '.fna', 'proteome_extension': '.faa', 'update': False, 'verbose': True}
Loading configuration file "02_tol.cfg"
Checking configuration file
Checking "/home/ubuntu/miniconda3/envs/phylophlan/bin/diamond"
Checking "/home/ubuntu/miniconda3/envs/phylophlan/bin/mafft"
Checking "/home/ubuntu/miniconda3/envs/phylophlan/bin/trimal"
Checking "/home/ubuntu/miniconda3/envs/phylophlan/bin/iqtree"
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/phylophlan/bin/phylophlan", line 10, in <module>
  File "/home/ubuntu/miniconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/", line 3227, in phylophlan_main
  File "/home/ubuntu/miniconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/", line 818, in init_database
    for f in glob.iglob(os.path.join(folder, '*'))
  File "/home/ubuntu/miniconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/", line 819, in <listcomp>
    for _, seq in SimpleFastaParser(, 'rt') if f.endswith('.bz2') else open(f))])
IsADirectoryError: [Errno 21] Is a directory: '/home/ubuntu/phylophlan/PhyloPhlAn.egg-info'

I note that there is lots of “”, but the files that were downloaded were .gz files - could this be the issue?

I am not even sure where to start debugging this - any help appreciated!


I am now struggling to recreate this error - I got it several times for a few hours this morning, but now it seems to have gone.

Intermittent gremlins?

Hi @BioMickWatson, thanks for reporting this.

So, the error seems to come from the init_database() function. So the you see, is not related to the wrong function for reading the inputs, but it is used to read the database file(s).
So all good here.

The only thing I can think of is the location of the phylophlan database. From the arguments, it seems that the database folder is detected to be:

Arguments: { ..., 'database': 'phylophlan', 'databases_folder': '/home/ubuntu', ...}

but from the error message:

IsADirectoryError: [Errno 21] Is a directory: '/home/ubuntu/phylophlan/PhyloPhlAn.egg-info'

it seems that /home/ubuntu/phylophlan doesn’t contain the database. Maybe you’re not able to re-create the error because you’re executing PhyloPhlAn from a different path and the databases folder is now correctly set?

Many thanks,