Hello
I have been trying to follow the instructions to recreate the ToL here: PhyloPhlAn 3.0: Example 02: Tree of life · biobakery/biobakery Wiki · GitHub
I am running PhyloPhlAn from a conda install, carried out yesterday.
This is what I ran:
phylophlan_get_reference -g all -n 1 -o genbank_species
This produced a directory called “genbank_species”, which contains 17,509 .fna.gz files
phylophlan_write_config_file \
-d a \
-o 02_tol.cfg \
--db_aa diamond \
--map_dna diamond \
--map_aa diamond \
--msa mafft \
--trim trimal \
--tree1 iqtree
Then:
phylophlan \
-i genbank_species \
-d phylophlan \
-f 02_tol.cfg \
--diversity high \
--fast \
-o output_tol \
--nproc 16
Error is:
PhyloPhlAn version 3.0.60 (27 November 2020)
Command line: /home/ubuntu/miniconda3/envs/phylophlan/bin/phylophlan -i genbank_species -d phylophlan -f 02_tol.cfg --diversity high --fast -o output_tol --nproc 4 --verbose
Automatically setting "database=phylophlan" and "databases_folder=/home/ubuntu"
Automatically setting "input=genbank_species" and "input_folder=/home/ubuntu"
"high-fast" preset
Setting "sort=True" because "database=phylophlan"
Setting "min_num_markers=100" since no value has been specified and the "database=phylophlan"
Arguments: {'input': 'genbank_species', 'clean': None, 'output': 'output_tol', 'database': 'phylophlan', 'db_type': None, 'config_file': '02_tol.cfg', 'diversity': 'high', 'accurate': False, 'fast': True, 'clean_all': False, 'database_list': False, 'submat': 'pfasum60', 'submat_list': False, 'submod_list': False, 'nproc': 4, 'min_num_proteins': 1, 'min_len_protein': 50, 'min_num_markers': 100, 'trim': 'greedy', 'gap_perc_threshold': 0.67, 'not_variant_threshold': 0.9, 'subsample': <function phylophlan at 0x7f074cdc2710>, 'unknown_fraction': 0.3, 'scoring_function': <function trident at 0x7f074cdc2d40>, 'sort': True, 'remove_fragmentary_entries': False, 'fragmentary_threshold': 0.67, 'min_num_entries': 4, 'maas': None, 'remove_only_gaps_entries': False, 'mutation_rates': False, 'force_nucleotides': False, 'input_folder': '/home/ubuntu/genbank_species', 'data_folder': 'output_tol/tmp', 'databases_folder': '/home/ubuntu', 'submat_folder': '/home/ubuntu/miniconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan_substitution_matrices/', 'submod_folder': '/home/ubuntu/miniconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan_substitution_models/', 'configs_folder': 'phylophlan_configs/', 'output_folder': '', 'genome_extension': '.fna', 'proteome_extension': '.faa', 'update': False, 'verbose': True}
Loading configuration file "02_tol.cfg"
Checking configuration file
Checking "/home/ubuntu/miniconda3/envs/phylophlan/bin/diamond"
Checking "/home/ubuntu/miniconda3/envs/phylophlan/bin/mafft"
Checking "/home/ubuntu/miniconda3/envs/phylophlan/bin/trimal"
Checking "/home/ubuntu/miniconda3/envs/phylophlan/bin/iqtree"
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/phylophlan/bin/phylophlan", line 10, in <module>
sys.exit(phylophlan_main())
File "/home/ubuntu/miniconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan.py", line 3227, in phylophlan_main
verbose=args.verbose)
File "/home/ubuntu/miniconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan.py", line 818, in init_database
for f in glob.iglob(os.path.join(folder, '*'))
File "/home/ubuntu/miniconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan.py", line 819, in <listcomp>
for _, seq in SimpleFastaParser(bz2.open(f, 'rt') if f.endswith('.bz2') else open(f))])
IsADirectoryError: [Errno 21] Is a directory: '/home/ubuntu/phylophlan/PhyloPhlAn.egg-info'
I note that there is lots of “bz2.open”, but the files that were downloaded were .gz files - could this be the issue?
I am not even sure where to start debugging this - any help appreciated!
Thanks
Mick