raxmlHPC failed: [e] refine_gene_tree crashed

Hi

I am running PhyloPhlAn with 869 input proteomes and the phylophlan database:

phylophlan -i pi -o phylophlan_output/ -d phylophlan -t a -f phylophlan_configs/supertree_aa.cfg --nproc 8 --diversity low --fast --verbose --maas /exports/cmvm/eddie/eb/groups/watson_grp/software/mickpython/phylophlan/lib/python3.9/site-packages/phylophlan/phylophlan_substitution_models/phylophlan.tsv

It crashes at the refine gene tree step. Output is:

 [e] Command '['/exports/cmvm/eddie/eb/groups/watson_grp/software/mickpython/phylophlan/bin/raxmlHPC', '-m', 'PROTCATLG', '-p', '1989', '-t', 'phylophlan_output/tmp/gene_tree1_polytomies/p0307.tre', '-w', '/exports/cmvm/eddie/eb/groups/watson_grp/11690_Watson_Mick/MAGS/phylophlan_output/tmp/gene_tree2', '-s', 'phylophlan_output/tmp/sub/p0307.aln',
  '-n', 'p0307.tre']' returned non-zero exit status 255.
 [e] Command '['/exports/cmvm/eddie/eb/groups/watson_grp/software/mickpython/phylophlan/bin/raxmlHPC', '-m', 'PROTCATRTREV', '-p', '1989', '-t', 'phylophlan_output/tmp/gene_tr
 ee1_polytomies/p0353.tre', '-w', '/exports/cmvm/eddie/eb/groups/watson_grp/11690_Watson_Mick/MAGS/phylophlan_output/tmp/gene_tree2', '-s', 'phylophlan_output/tmp/sub/p0353.al
 n', '-n', 'p0353.tre']' returned non-zero exit status 255.
 
 
 [e] error while executing
     command_line: /exports/cmvm/eddie/eb/groups/watson_grp/software/mickpython/phylophlan/bin/raxmlHPC -m PROTCATRTREV -p 1989 -t phylophlan_output/tmp/gene_tree1_polytomies/
 p0353.tre -w /exports/cmvm/eddie/eb/groups/watson_grp/11690_Watson_Mick/MAGS/phylophlan_output/tmp/gene_tree2 -s phylophlan_output/tmp/sub/p0353.aln -n p0353.tre
            stdin: None
           stdout: None
              env: {chopped}
 [e] error while executing
     command_line: /exports/cmvm/eddie/eb/groups/watson_grp/software/mickpython/phylophlan/bin/raxmlHPC -m PROTCATLG -p 1989 -t phylophlan_output/tmp/gene_tree1_polytomies/p03
 07.tre -w /exports/cmvm/eddie/eb/groups/watson_grp/11690_Watson_Mick/MAGS/phylophlan_output/tmp/gene_tree2 -s phylophlan_output/tmp/sub/p0307.aln -n p0307.tre
            stdin: None
           stdout: None
              env: {chopped}
 
 [e] Command '['/exports/cmvm/eddie/eb/groups/watson_grp/software/mickpython/phylophlan/bin/raxmlHPC', '-m', 'PROTCATRTREV', '-p', '1989', '-t', 'phylophlan_output/tmp/gene_tree1_polytomies/p0353.tre', '-w', '/exports/cmvm/eddie/eb/groups/watson_grp/11690_Watson_Mick/MAGS/phylophlan_output/tmp/gene_tree2', '-s', 'phylophlan_output/tmp/sub/p0353.al
 n', '-n', 'p0353.tre']' returned non-zero exit status 255.
 
 [e] Command '['/exports/cmvm/eddie/eb/groups/watson_grp/software/mickpython/phylophlan/bin/raxmlHPC', '-m', 'PROTCATLG', '-p', '1989', '-t', 'phylophlan_output/tmp/gene_tree1_polytomies/p0307.tre', '-w', '/exports/cmvm/eddie/eb/groups/watson_grp/11690_Watson_Mick/MAGS/phylophlan_output/tmp/gene_tree2', '-s', 'phylophlan_output/tmp/sub/p0307.aln',
  '-n', 'p0307.tre']' returned non-zero exit status 255.
 
 [e] error while refining gene tree
     {'program_name': '/exports/cmvm/eddie/eb/groups/watson_grp/software/mickpython/phylophlan/bin/raxmlHPC', 'params': '-p 1989', 'database': '-t', 'input': '-s', 'output_pat
 h': '-w', 'output': '-n', 'version': '-v', 'model': '-m', 'command_line': '#program_name# #model# #params# #database# #output_path# #input# #output#'}
     PROTCATRTREV
     phylophlan_output/tmp/sub/p0353.aln
     phylophlan_output/tmp/gene_tree1_polytomies/p0353.tre
     /exports/cmvm/eddie/eb/groups/watson_grp/11690_Watson_Mick/MAGS/phylophlan_output/tmp/gene_tree2
     p0353.tre
 
 [e] error while refining gene tree
     {'program_name': '/exports/cmvm/eddie/eb/groups/watson_grp/software/mickpython/phylophlan/bin/raxmlHPC', 'params': '-p 1989', 'database': '-t', 'input': '-s', 'output_pat
 h': '-w', 'output': '-n', 'version': '-v', 'model': '-m', 'command_line': '#program_name# #model# #params# #database# #output_path# #input# #output#'}
     PROTCATLG
     phylophlan_output/tmp/sub/p0307.aln
     phylophlan_output/tmp/gene_tree1_polytomies/p0307.tre
     /exports/cmvm/eddie/eb/groups/watson_grp/11690_Watson_Mick/MAGS/phylophlan_output/tmp/gene_tree2
     p0307.tre
 
 [e] Command '['/exports/cmvm/eddie/eb/groups/watson_grp/software/mickpython/phylophlan/bin/raxmlHPC', '-m', 'PROTCATRTREV', '-p', '1989', '-t', 'phylophlan_output/tmp/gene_tree1_polytomies/p0353.tre', '-w', '/exports/cmvm/eddie/eb/groups/watson_grp/11690_Watson_Mick/MAGS/phylophlan_output/tmp/gene_tree2', '-s', 'phylophlan_output/tmp/sub/p0353.al
 n', '-n', 'p0353.tre']' returned non-zero exit status 255.
 
 [e] refine_gene_tree crashed

PhyloPhlAn was installed using bioconda. RaxmlHPC is version 8.2.12

Any help much appreciated!

Thanks
Mick

Digging further into this, when I isolate a problem, I get:

RAxML can't, parse the alignment file as phylip file
it will now try to parse it as FASTA file

 ERROR: Sequence 105265R.23.clean consists entirely of undetermined values which will be treated as missing data
 ERROR: Found 1 sequences that consist entirely of undetermined values, exiting...

In fact, raxmlHPC is entirely correct:

 >105265C.64.clean
 -VSQVNAYVKQLLEALNDITISGEISGFKRHSSGHVYFSLKDESATIRCAFFKPHSLKIG
 FEPKDAKVLARGRITLYERDGQYQLNVFELLEDGVDLFAQFLQMKEREKELFDNQFKKPI
 PKYVKKIGVATSPTGAVIQIKNVAFRRCPNVSLVLAPVAVDDAPRSICLGLELLDKDDVD
 VIILASMEDWCNSEEVARAIFKCKKVISAVTFTIAFVADLRAPSAAELAVFDYFEQLQAL
 >105265R.23.clean
 ------------------------------------------------------------
 ------------------------------------------------------------
 ------------------------------------------------------------
 ------------------------------------------------------------
 >105244_C.49.clean
 SVSQINAYIRRMFYLLHSVLVRGEVSNCKYHASGHIYFTLKDASGTLSCVMFAGRRRGLS
 FHMQNDQVIAAGSVDVYAKTGSYQLYASQIIRDGVALAERFEQLKKKLEQMFDASYKQPI
 PSYVRRLGVVTAATGAAVRIIQIAKRRNPYIEIILYPAIVDAAPDSIIRGIQALDREGVD
 VIIIGSLEDWANEERVAEAVFDCATVISAVTTVITFVADLRAPSAAELAVFDLCRYDGDL

So the question is: what part of PhyloPhlAn is producing an alignment that is then un-parseable by raxmlHPC?

Cheers
Mick

OK, currently retrying with

--remove_only_gaps_entries --remove_fragmentary_entries --fragmentary_threshold 0.85

OK, now it just fails on a different tree:

[e] Command '['/exports/cmvm/eddie/eb/groups/watson_grp/software/mickpython/phylophlan/bin/raxmlHPC', '-m', 'PROTCATLG', '-p', '1989', '-t', 'phylophlan_output_test/tmp/gene_
tree1_polytomies/p0270.tre', '-w', '/exports/cmvm/eddie/eb/groups/watson_grp/11690_Watson_Mick/MAGS/phylophlan_output_test/tmp/gene_tree2', '-s', 'phylophlan_output_test/tmp/
sub/p0270.aln', '-n', 'p0270.tre']' returned non-zero exit status 255.

[e] error while executing
    command_line: /exports/cmvm/eddie/eb/groups/watson_grp/software/mickpython/phylophlan/bin/raxmlHPC -m PROTCATLG -p 1989 -t phylophlan_output_test/tmp/gene_tree1_polytomie
s/p0270.tre -w /exports/cmvm/eddie/eb/groups/watson_grp/11690_Watson_Mick/MAGS/phylophlan_output_test/tmp/gene_tree2 -s phylophlan_output_test/tmp/sub/p0270.aln -n p0270.tre

When I run it outside ph PhyloPhlAn:

RAxML can't, parse the alignment file as phylip file
it will now try to parse it as FASTA file

ERROR: Sequence 105265C.103.clean consists entirely of undetermined values which will be treated as missing data
ERROR: Found 1 sequences that consist entirely of undetermined values, exiting...

And again, raxmlHPC is correct:

>105244_R.116.clean
LVKKYKQRTVVKGVSIEVNQEIVGLLPNATFYMIVGLIKPFSHVYL
>105265C.64.clean
LTKKYKDVIAVDNLSLTINKELFSLLVNATIKMLSCLTKPTSDAFL
>105244_C.49.clean
LTKKFGTFTAVDHVDLTIKDEFFGLLPNATISMLSTVLLPTERILL
>105265C.103.clean
----------------------------------------------
>105265R.28.clean
LVKRYGKRTVVNHVSFDVRQEIVGLLPNASFYMTTGLITPNEHIYL
>105244_R.118.clean
PMKIWTKIESVKDLSFSMEQEIIGFVHNATIKMIMGFIKPTSEI--
>105265C.27.clean
LSKTYGKRMVIKDISLEARQESVGLLPNAAFYCITGQLAATGQIFL
>105265R.13.clean
LVKKYGVRTVVKGVSMEVEQEIVGFLPNASFYMITGQIVPNDRVFL
>105244_R.106.clean
LVKKYGKRTVVKGVSIEVEQEIVGLLPNASFYMITGLIKPNARIFL
>105265C.109.clean
-ADSAGKRPILSDVSLSVPDDFLVVALTS-----------------

How is this possible? Please help! :smiley:

Just to add I tested this on a bunch of public Eubacterium proteomes and got the same error, again based on an algnment with all “-”

>Eubacterium_limosum_strain_SA11
AVDGVSFTLNRSTLGIESCSTMGRSVLIEPTEQIYEEIMT
>Eubacterium_ruminantium_strain_ATCC_17233
VLKNINLKIREGMLGIRSASTLVNLISYDVNESLIDNVKD
>Eubacterium_sp_AB3007
----------------------------------------
>Eubacterium_oxidoreducens_strain_DSM_3217
ILKDVSFTIEPGKVALVNTSSLLR----------------
>Eubacterium_ruminantium_strain_HUN269
VLKNINLKIREGMLGIRSASTLVNLISYDVNESLIDNVKD
>Eubacterium_callanderi_strain_NLAE_zl_G225
AVDGVSFTLNRSTLGIESCSTMGRSVLIEPTEQIYEEIMT
>Eubacterium_ruminantium_strain_2388
VLKNINLKIREGMLGIRSASTLVNLISYDVNESLIDNVKD
>Eubacterium_uniforme_strain_ATCC_35992
AVKNVSFSVKKGTLGLESCTTVGKCVLYQITHKVFAEVYE

I just want to highlight I am running this with

–remove_fragmentary_entries --fragmentary_threshold 0.85

Hello @BioMickWatson and thank you for all your tests! (which I was going to ask you to do to nail down the problem is :sweat_smile:)

Indeed, that should not happen and it is strange. I believe the PhyloPhlAn conda package might not be updated with the latest version in the GitHub repository. If your version is not 3.0.60, can you please get PhyloPhlAn directly from the repo and re-run your analysis?

Many thanks,
Francesco

Thanks Francesco, indeed the version I have is 3.0.58 :slight_smile:

I will try the github install :slight_smile:

OK, I am really struggling with the github install.

I do not have root on my system, and when I use --prefix it says that it cannot find any of the installed libraries :smiley:

I have become too fat and happy with conda.

How do I get the latest version installed?

E.g. if I run:

git clone https://github.com/biobakery/phylophlan
cd phylophlan
python setup.py install

Then try

./bin/phylophlan

I get

Traceback (most recent call last):
  File "./bin/phylophlan", line 6, in <module>
    from pkg_resources import load_entry_point
  File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3253, in <module>
    def _initialize_master_working_set():
  File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3236, in _call_aside
    f(*args, **kwargs)
  File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3265, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 584, in _build_master
    ws.require(__requires__)
  File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 901, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 787, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'PhyloPhlAn==3.0.1' distribution was not found and is required by the application

Ah, sorry about that. Non need to do the setup.py install.
Assuming you installed PhyloPhlAn from conda in an env named phylophlan it should be enough to:

git clone https://github.com/biobakery/phylophlan  # you probably don't need this as you already cloned it I guess
cd phylophlan
conda deactivate && conda activate phylophlan
phylophlan/phylophlan.py --version

Then just use the script instead of the phylophlan command from the env.

Great, that solved that issue, and the (test) set that I have now gets past that point :smiley:

However, it now fails at another point:

[e] error while executing
    command_line: java -jar astral-4.11.1/astral.4.11.1.jar -i phylophlan_output_test2/tmp/gene_trees.tre -o /exports/cmvm/eddie/eb/groups/watson_grp/11690_Watson_Mick/MAGS/p
hylophlan_output_test2/phylotest2.tre
           stdin: None
          stdout: None

I cannot find astral*.jar anywhere in the conda environment - is that a new dependency in 3.0.60?

Where should astral.4.11.1.jar be?

Thank you!

Great news, thanks! (this also means that I have to update the PhyloPhlAn package in conda :sweat_smile:)

ASTRAL unfortunately is not available in conda and hence is not automatically installed with the PhyloPhlAn package. ASTRAL can be retrieved from here: GitHub - smirarab/ASTRAL: Accurate Species TRee ALgorithm and should be manually installed. I put a very brief disclaimer about this in the wiki: Home · biobakery/phylophlan Wiki · GitHub but I should probably print a warning when generating the configuration file because it cannot be generated automatically and one should fill-in manually the details for ASTRAL, sorry about this.