Unable to create the tree of life: IsADirectoryError: [Errno 21] Is a directory: '/home/ubuntu/phylophlan/PhyloPhlAn.egg-info'

Hello

I have been trying to follow the instructions to recreate the ToL here: PhyloPhlAn 3.0: Example 02: Tree of life · biobakery/biobakery Wiki · GitHub

I am running PhyloPhlAn from a conda install, carried out yesterday.

This is what I ran:

phylophlan_get_reference -g all -n 1 -o genbank_species

This produced a directory called “genbank_species”, which contains 17,509 .fna.gz files

phylophlan_write_config_file \
    -d a \
    -o 02_tol.cfg \
    --db_aa diamond \
    --map_dna diamond \
    --map_aa diamond \
    --msa mafft \
    --trim trimal \
    --tree1 iqtree

Then:

phylophlan \
    -i genbank_species \
    -d phylophlan \
    -f 02_tol.cfg \
    --diversity high \
    --fast \
    -o output_tol \
    --nproc 16

Error is:

PhyloPhlAn version 3.0.60 (27 November 2020)

Command line: /home/ubuntu/miniconda3/envs/phylophlan/bin/phylophlan -i genbank_species -d phylophlan -f 02_tol.cfg --diversity high --fast -o output_tol --nproc 4 --verbose

Automatically setting "database=phylophlan" and "databases_folder=/home/ubuntu"
Automatically setting "input=genbank_species" and "input_folder=/home/ubuntu"
"high-fast" preset
Setting "sort=True" because "database=phylophlan"
Setting "min_num_markers=100" since no value has been specified and the "database=phylophlan"
Arguments: {'input': 'genbank_species', 'clean': None, 'output': 'output_tol', 'database': 'phylophlan', 'db_type': None, 'config_file': '02_tol.cfg', 'diversity': 'high', 'accurate': False, 'fast': True, 'clean_all': False, 'database_list': False, 'submat': 'pfasum60', 'submat_list': False, 'submod_list': False, 'nproc': 4, 'min_num_proteins': 1, 'min_len_protein': 50, 'min_num_markers': 100, 'trim': 'greedy', 'gap_perc_threshold': 0.67, 'not_variant_threshold': 0.9, 'subsample': <function phylophlan at 0x7f074cdc2710>, 'unknown_fraction': 0.3, 'scoring_function': <function trident at 0x7f074cdc2d40>, 'sort': True, 'remove_fragmentary_entries': False, 'fragmentary_threshold': 0.67, 'min_num_entries': 4, 'maas': None, 'remove_only_gaps_entries': False, 'mutation_rates': False, 'force_nucleotides': False, 'input_folder': '/home/ubuntu/genbank_species', 'data_folder': 'output_tol/tmp', 'databases_folder': '/home/ubuntu', 'submat_folder': '/home/ubuntu/miniconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan_substitution_matrices/', 'submod_folder': '/home/ubuntu/miniconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan_substitution_models/', 'configs_folder': 'phylophlan_configs/', 'output_folder': '', 'genome_extension': '.fna', 'proteome_extension': '.faa', 'update': False, 'verbose': True}
Loading configuration file "02_tol.cfg"
Checking configuration file
Checking "/home/ubuntu/miniconda3/envs/phylophlan/bin/diamond"
Checking "/home/ubuntu/miniconda3/envs/phylophlan/bin/mafft"
Checking "/home/ubuntu/miniconda3/envs/phylophlan/bin/trimal"
Checking "/home/ubuntu/miniconda3/envs/phylophlan/bin/iqtree"
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/phylophlan/bin/phylophlan", line 10, in <module>
    sys.exit(phylophlan_main())
  File "/home/ubuntu/miniconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan.py", line 3227, in phylophlan_main
    verbose=args.verbose)
  File "/home/ubuntu/miniconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan.py", line 818, in init_database
    for f in glob.iglob(os.path.join(folder, '*'))
  File "/home/ubuntu/miniconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan.py", line 819, in <listcomp>
    for _, seq in SimpleFastaParser(bz2.open(f, 'rt') if f.endswith('.bz2') else open(f))])
IsADirectoryError: [Errno 21] Is a directory: '/home/ubuntu/phylophlan/PhyloPhlAn.egg-info'

I note that there is lots of “bz2.open”, but the files that were downloaded were .gz files - could this be the issue?

I am not even sure where to start debugging this - any help appreciated!

Thanks
Mick

I am now struggling to recreate this error - I got it several times for a few hours this morning, but now it seems to have gone.

Intermittent gremlins?

Hi @BioMickWatson, thanks for reporting this.

So, the error seems to come from the init_database() function. So the bz2.open() you see, is not related to the wrong function for reading the inputs, but it is used to read the database file(s).
So all good here.

The only thing I can think of is the location of the phylophlan database. From the arguments, it seems that the database folder is detected to be:

Arguments: { ..., 'database': 'phylophlan', 'databases_folder': '/home/ubuntu', ...}

but from the error message:

IsADirectoryError: [Errno 21] Is a directory: '/home/ubuntu/phylophlan/PhyloPhlAn.egg-info'

it seems that /home/ubuntu/phylophlan doesn’t contain the database. Maybe you’re not able to re-create the error because you’re executing PhyloPhlAn from a different path and the databases folder is now correctly set?

Many thanks,
Francesco