The bioBakery help forum

Question about Srainphlan call phylophlan

Hi,
I’ve been working on your tutorial recently

https://github.com/biobakery/MetaPhlAn/wiki/StrainPhlAn-3.0

StrainPhlAn: metagenomic strain-level population genomics

when I run step5, it says StrainPhlAn will call PhyloPhlAn to produce a multiple sequence alignment (MSA) to then build the phylogenetic tree. then I get an error

strainphlan -s consensus_markers/*.pkl -m db_markers/s__Bacteroides_caccae.fna -r reference_genomes/G000273725.fna.bz2 -o output -n 8 -c s__Bacteroides_caccae --phylophlan_mode accurate --mutation_rates 

Tue May 12 12:37:58 2020: Start StrainPhlAn 3.0 execution
Tue May 12 12:37:58 2020: Creating temporary directory...
Tue May 12 12:37:58 2020: Done.
Tue May 12 12:37:58 2020: Getting markers from main sample files...
Tue May 12 12:38:00 2020: Done.
Tue May 12 12:38:00 2020: Getting markers from main reference files...Warning: [blastn] Examining 5 or more matches is recommended

Tue May 12 12:38:17 2020: Done.
Tue May 12 12:38:17 2020: Removing bad markers / samples...
Tue May 12 12:38:17 2020: Done.
Tue May 12 12:38:17 2020: Writing samples as markers' FASTA files...
Tue May 12 12:38:18 2020: Done.
Tue May 12 12:38:18 2020: Writing filtered clade markers as FASTA file...
Tue May 12 12:38:18 2020: Done.
Tue May 12 12:38:18 2020: Calculating polymorphic rates...
Tue May 12 12:38:19 2020: Done.
Tue May 12 12:38:19 2020: Executing PhyloPhlAn 3.0...
Tue May 12 12:38:19 2020: 	Creating PhyloPhlAn 3.0 database...
Tue May 12 12:38:23 2020: 	Done.
Tue May 12 12:38:23 2020: 	Generating PhyloPhlAn 3.0 configuration file...
Tue May 12 12:38:24 2020: 	Done.
Tue May 12 12:38:24 2020: 	Processing samples...[e] unable to download "https://www.dropbox.com/s/x7cvma5bjzlllbt/phylophlan_databases.txt?dl=1"

[e] An error was ocurred executing a external tool, exiting...
Tue May 12 12:38:56 2020: Stop StrainPhlAn 3.0 execution.

I can’t download databases from wget, then I download this database from chrome. So my question is

Which folder should I put this database in?

my metaphlan3 version is
MetaPhlAn version 3.0 (20 Mar 2020)

Hi, thanks for reporting this error.
Could you please send me the content of the “tmp” folder created on the output directory? PhyloPhlAn should detect a custom database inside this folder.

Do you mean put the phylophlan databases in the output/tmp ?

  • There is no TMP folder at the beginning, which is generated automatically by the program
  • if I put this data into tmp , I will get another error
  • I have to delete this folder before I can rerun the code
  • So the custom data can’t put into that TMP folder
$ strainphlan -s consensus_markers/*.pkl -m db_markers/s__Bacteroides_caccae.fna -r reference_genomes/G000273725.fna.bz2 -o output -n 8 -c s__Bacteroides_caccae --phylophlan_mode accurate --mutation_rates
Tue May 12 20:50:12 2020: Start StrainPhlAn 3.0 execution
Tue May 12 20:50:12 2020: Creating temporary directory...
[e] Folder "output/tmp/" already exists!
[Errno 17] File exists: 'output/tmp/'
Tue May 12 20:50:12 2020: Stop StrainPhlAn 3.0 execution.
  • output/tmp/ folder like this
$ tree output/tmp/
output/tmp/
├── blastn
│   ├── G000273725.blastn
│   ├── G000273725.fna
│   ├── G000273725.nhr
│   ├── G000273725.nin
│   ├── G000273725.nog
│   ├── G000273725.nsd
│   ├── G000273725.nsi
│   └── G000273725.nsq
├── phylophlan.cfg
├── s__Bacteroides_caccae
│   ├── 100962142444.fna
│   ├── 10707964291.fna
│   ├── 107409344468.fna
│   ├── 125002894666.fna
│   ├── 126949713805.fna
│   ├── 134423094594.fna
│   ├── 139818877581.fna
│   ├── 142490419495.fna
│   ├── 154192683906.fna
│   ├── 165490003419.fna
│   ├── 17054122781.fna
│   ├── 204727649438.fna
│   ├── 209552228620.fna
│   ├── 216660259738.fna
│   ├── 226544463668.fna
│   ├── 243228557043.fna
│   ├── 267226127229.fna
│   ├── 269799316554.fna
│   ├── 270971955966.fna
│   ├── 299024586049.fna
│   ├── 319082699479.fna
│   ├── 321963182198.fna
│   ├── 32219072772.fna
│   ├── 324010482962.fna
│   ├── 331378372858.fna
│   ├── 337407733310.fna
│   ├── 33889281312.fna
│   ├── 340178813911.fna
│   ├── 343360013413.fna
│   ├── 351598256290.fna
│   ├── 365428329947.fna
│   ├── 369554957130.fna
│   ├── 388399941456.fna
│   ├── 388931405085.fna
│   ├── 390792119347.fna
│   ├── 40215400440.fna
│   ├── 41491627369.fna
│   ├── 422860375518.fna
│   ├── 430704850529.fna
│   ├── 448954489341.fna
│   ├── 462658618828.fna
│   ├── 466238389470.fna
│   ├── 49334083016.fna
│   ├── 493851244497.fna
│   ├── 497016645827.fna
│   ├── 512708660491.fna
│   ├── 516119017819.fna
│   ├── 516757498386.fna
│   ├── 516976570508.fna
│   ├── 520399289967.fna
│   ├── 533303354136.fna
│   ├── 536813624407.fna
│   ├── 543832402416.fna
│   ├── 55606866191.fna
│   ├── 560487834466.fna
│   ├── 569205066036.fna
│   ├── 571470053895.fna
│   ├── 587718496639.fna
│   ├── 591769795473.fna
│   ├── 596440963618.fna
│   ├── 600783555514.fna
│   ├── 60129824842.fna
│   ├── 603216648309.fna
│   ├── 609210216059.fna
│   ├── 637681421627.fna
│   ├── 641396155201.fna
│   ├── 654759781895.fna
│   ├── 667621796877.fna
│   ├── 674106037769.fna
│   ├── 675423990412.fna
│   ├── 687297436824.fna
│   ├── 691455569293.fna
│   ├── 691549877339.fna
│   ├── 703994262970.fna
│   ├── 707420675883.fna
│   ├── 707734980982.fna
│   ├── 708121471729.fna
│   ├── 713575532389.fna
│   ├── 723537137250.fna
│   ├── 739166189642.fna
│   ├── 751490616559.fna
│   ├── 753126535642.fna
│   ├── 768748674440.fna
│   ├── 78366373021.fna
│   ├── 785777880550.fna
│   ├── 793664157014.fna
│   ├── 800748324758.fna
│   ├── 809325639489.fna
│   ├── 814249854614.fna
│   ├── 817380599198.fna
│   ├── 82187262853.fna
│   ├── 825849995750.fna
│   ├── 828211249414.fna
│   ├── 833253584348.fna
│   ├── 833560885404.fna
│   ├── 838134468860.fna
│   ├── 838609864064.fna
│   ├── 853116330460.fna
│   ├── 859568187822.fna
│   ├── 867560420461.fna
│   ├── 894650175884.fna
│   ├── 901796891883.fna
│   ├── 937703943815.fna
│   ├── 949642764355.fna
│   ├── 95379619952.fna
│   ├── 957930510568.fna
│   ├── 960791833330.fna
│   ├── 970772430942.fna
│   ├── 979828342697.fna
│   ├── 989413483308.fna
│   └── s__Bacteroides_caccae.fna
└── s__Bacteroides_caccae.StrainPhlAn3
    ├── G000273725.fna
    ├── SRS013951.fastq.fna
    ├── SRS014613.fastq.fna
    ├── SRS019161.fastq.fna
    ├── SRS022137.fastq.fna
    ├── SRS055982.fastq.fna
    └── SRS064276.fastq.fna

No, the problem you are experimenting is due PhyloPhlAn is not detecting the “output/tmp/s__Bacteroides_caccae” folder and then it is trying to download the default PhyloPhlAn DB. This looks as a permissions problem.
Using the filtered markers from the first part of the processing, StrainPhlAn creates that folder inside tmp before PhyloPhlan is called. Could you please sent me the full path to the “output/tmp/s__Bacteroides_caccae” folder and check the permissions of that folder? Are you using a virtual machine or a network shared folder?

I run my program on the server,
full path way
/public/home/sample_lib/ckzhu/software/StrainPhlAn2/strainphlan3/ckzhu_example/output

(metaphlan3) [ckzhu@vm-login02 output]$ ls -lhrt tmp/
total 17K
drwxr-xr-x 2 ckzhu sample_lib 4.0K May 12 22:27 blastn
drwxr-xr-x 2 ckzhu sample_lib 4.0K May 12 22:27 s__Bacteroides_caccae.StrainPhlAn3
drwxr-xr-x 2 ckzhu sample_lib 8.0K May 12 22:27 s__Bacteroides_caccae
-rw-r--r-- 1 ckzhu sample_lib  791 May 12 22:27 phylophlan.cfg
(metaphlan3) [ckzhu@vm-login02 s__Bacteroides_caccae]$ ls -lhrt
total 285K
-rw-r--r-- 1 ckzhu sample_lib 1.8K May 12 22:27 989413483308.fna
-rw-r--r-- 1 ckzhu sample_lib  606 May 12 22:27 979828342697.fna
-rw-r--r-- 1 ckzhu sample_lib 1.0K May 12 22:27 970772430942.fna
...
-rw-r--r-- 1 ckzhu sample_lib 1.5K May 12 22:27 142490419495.fna
-rw-r--r-- 1 ckzhu sample_lib 1.4K May 12 22:27 139818877581.fna
-rw-r--r-- 1 ckzhu sample_lib 1.3K May 12 22:27 134423094594.fna
-rw-r--r-- 1 ckzhu sample_lib  658 May 12 22:27 126949713805.fna
-rw-r--r-- 1 ckzhu sample_lib  481 May 12 22:27 125002894666.fna
-rw-r--r-- 1 ckzhu sample_lib  716 May 12 22:27 107409344468.fna
-rw-r--r-- 1 ckzhu sample_lib 1.3K May 12 22:27 10707964291.fna
-rw-r--r-- 1 ckzhu sample_lib 1.7K May 12 22:27 100962142444.fna
-rw-r--r-- 1 ckzhu sample_lib 158K May 12 22:27 s__Bacteroides_caccae.fna

Could you please check your phylophlan version?
$ phylophlan -v

(metaphlan3) [ckzhu@vm-login02 s__Bacteroides_caccae]$ metaphlan -v
MetaPhlAn version 3.0 (20 Mar 2020)
(metaphlan3) [ckzhu@vm-login02 s__Bacteroides_caccae]$ phylophlan -v
PhyloPhlAn version 0.43 (2 March 2020)

Could you upgrade your phylophlan version to the v3.0.51?
$ conda install -c bioconda phylophlan

In the version 0.43, phylophlan will try, even if you specify a custom database, to download the “phylophlan_databases.txt” file. This is an step that cannot be avoided without modifying the code, and in your server it seems you have problems to download from Dropbox, so this means both strainphlan and phylophlan will fail all the time. However, in the last version (3.0.51) phylophlan first checks if you specify a database path and if not, it downloads the txt file, so I think this will solve your problem.

Hi aitor
It works,but I get another error

(metaphlan3) [ckzhu@vm-login02 ckzhu_example]$ strainphlan -s consensus_markers/*.pkl -m db_markers/s__Bacteroides_caccae.fna -r reference_genomes/G000273725.fna.bz2 -o output -n 8 -c s__Bacteroides_caccae --phylophlan_mode accurate --mutation_rates
Wed May 13 02:22:41 2020: Start StrainPhlAn 3.0 execution
Wed May 13 02:22:41 2020: Creating temporary directory...
Wed May 13 02:22:41 2020: Done.
Wed May 13 02:22:41 2020: Getting markers from main sample files...
Wed May 13 02:22:41 2020: Done.
Wed May 13 02:22:41 2020: Getting markers from main reference files...Warning: [blastn] Examining 5 or more matches is recommended

Wed May 13 02:22:44 2020: Done.
Wed May 13 02:22:44 2020: Removing bad markers / samples...
Wed May 13 02:22:44 2020: Done.
Wed May 13 02:22:44 2020: Writing samples as markers' FASTA files...
Wed May 13 02:22:45 2020: Done.
Wed May 13 02:22:45 2020: Writing filtered clade markers as FASTA file...
Wed May 13 02:22:45 2020: Done.
Wed May 13 02:22:45 2020: Calculating polymorphic rates...
Wed May 13 02:22:45 2020: Done.
Wed May 13 02:22:45 2020: Executing PhyloPhlAn 3.0...
Wed May 13 02:22:45 2020: 	Creating PhyloPhlAn 3.0 database...
Wed May 13 02:22:45 2020: 	Done.
Wed May 13 02:22:45 2020: 	Generating PhyloPhlAn 3.0 configuration file...
Wed May 13 02:22:45 2020: 	Done.
Wed May 13 02:22:45 2020: 	Processing samples...
Wed May 13 02:23:09 2020: 	Done.
Wed May 13 02:23:09 2020: Done.
Wed May 13 02:23:09 2020: Writing information file...Traceback (most recent call last):
  File "/public/home/sample_lib/ckzhu/miniconda3/envs/metaphlan3/bin/strainphlan", line 10, in <module>
    sys.exit(main())
  File "/public/home/sample_lib/ckzhu/miniconda3/envs/metaphlan3/lib/python3.7/site-packages/metaphlan/strainphlan.py", line 830, in main
    args.mutation_rates, args.print_clades_only, args.nprocs)
  File "/public/home/sample_lib/ckzhu/miniconda3/envs/metaphlan3/lib/python3.7/site-packages/metaphlan/strainphlan.py", line 790, in strainphlan
    phylophlan_mode, nprocs)
  File "/public/home/sample_lib/ckzhu/miniconda3/envs/metaphlan3/lib/python3.7/site-packages/metaphlan/strainphlan.py", line 655, in write_info
    "\nNumber of processes used: "+ str(nprocs)) + "\n" 
TypeError: unsupported operand type(s) for +: 'int' and 'str'

That is great!
Yes, that error was already reported and we will update the conda package ASAP. The main problem you will have without fixing the error is that the tmp folder will not be deleted at the end of the execution.
However, if you want to manually fix in your script, you can change the line for this:

       "\nNumber of processes used: "+ str(nprocs) + "\n" )

Hi aitor
it worked!!
Thanks very much for your help!!!
hahaha~