The bioBakery help forum

Phylophan and Binned fasta

Is it possible to use Phylophlan 3 with Metabat2 binned fasta from de novo bacterial genome assembly? Is there any specifical command or tutorial for this kind of situation?

Hi there, please have a look at the tutorials here: PhyloPhlAn3 · biobakery/biobakery Wiki · GitHub and in particular, you might be interested in:

If you should need more details about configurations and parameters you can refer to the PhyloPhlAn wiki here: Home · biobakery/phylophlan Wiki · GitHub

I hope these are of help.

Many thanks,

Thanks a lot for you support.
I am trying to use the tutorial on Metagenomic application.
I locate my bin.fa in a folder called input_metagenomic but when I run these commands

 phylophlan_metagenomic \
     -i input_metagenomic \
     -o output_metagenomic \
     --nproc 4 \
     -n 1 \
     -d SGB.Jan19 \
    --verbose 2>&1 | tee phylophlan_metagenomic.log

I get the following error result:

Traceback (most recent call last):
  File "/opt/miniconda3/envs/phylophlan-2020.5/bin/phylophlan_metagenomic", line 7, in <module>
    from phylophlan.phylophlan_metagenomic import phylophlan_metagenomic
  File "/opt/miniconda3/envs/phylophlan-2020.5/lib/python3.8/site-packages/phylophlan/", line 29, in <module>
    import pandas as pd
  File "/opt/miniconda3/envs/phylophlan-2020.5/lib/python3.8/site-packages/pandas/", line 29, in <module>
    from pandas._libs import hashtable as _hashtable, lib as _lib, tslib as _tslib
  File "/opt/miniconda3/envs/phylophlan-2020.5/lib/python3.8/site-packages/pandas/_libs/", line 13, in <module>
    from pandas._libs.interval import Interval
  File "pandas/_libs/interval.pyx", line 1, in init pandas._libs.interval
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

Thanks for reporting this. I don’t think it is related to PhyloPhlAn, and from a quick search, it seems it could be related to the numpy library. Can you try re-installing numpy in the phylophlan-2020.5 env?

Thanks, Francesco

Thanks a lot Francesco for your support, no it is working but I am getting another error…maybe because as reference I have only one species

  File "/opt/miniconda3/envs/phylophlan-2020.5/bin/phylophlan", line 10, in <module>
  File "/opt/miniconda3/envs/phylophlan-2020.5/lib/python3.8/site-packages/phylophlan/", line 3200, in phylophlan_main
    standard_phylogeny_reconstruction(project_name, configs, args, db_dna, db_aa)
  File "/opt/miniconda3/envs/phylophlan-2020.5/lib/python3.8/site-packages/phylophlan/", line 3005, in standard_phylogeny_reconstruction
    all_inputs = (os.path.splitext(os.path.basename(i))[0] for i in input_faa_clean)
UnboundLocalError: local variable 'input_faa_clean' referenced before assignment

Hi, the error you posted:

is not from phylophlan_metagenomic though. This I think is coming from phylophlan, correct?

Can you please provide the correct command line and the full output using the --verbose option? Also having the content of the input folder would be helpful.

Many thanks,

Yes I was following even the tutorial on S. aureus because I wanted to produce a graphlan input

this is the command line used

phylophlan \
    -i input_bins \
    -o output_isolates \
    -d s__Desulfomicrobium_orale \
    --trim greedy \
    --not_variant_threshold 0.99 \
    --remove_fragmentary_entries \
    --fragmentary_threshold 0.67 \
    --min_num_entries 135 \
    -t a \
    -f isolates_config.cfg \
    --diversity low \
    --force_nucleotides \
    --nproc 4 \
    --verbose 2>&1 | tee phylophlan__output_isolates.log

I think a problem is the min num entries

Great, thanks!

Can you please provide the full output of the command above? (the phylophlan__output_isolates.log would work as well)
Also, can you provide the content of the input folder input_bins? Does it contain the set of 135 genomes as described in the tutorial?

Thanks, Francesco

Yes of course I attach it here.
I set up with 2 cause I had only two genomes
phylophlan__output_isolates.tsv (8.0 KB)

Thanks for sending the log file.

So, the error could be due to the fact that MAFFT needs more than just 2 sequences to do the multiple sequence alignment.
You can verify what the problem is by running the command in the log file (I only removed the --quiet param):

/opt/miniconda3/envs/phylophlan-2020.5/bin/mafft --anysymbol --thread 1 --auto output_isolates/tmp/markers/UniRef90-A0A109W5J4.fna

(if you want to report here the full output I’ll be happy to give it a look)

Many thanks,