StrainPhlAn: metagenomic strain-level population genomics
when I run step5, it says StrainPhlAn will call PhyloPhlAn to produce a multiple sequence alignment (MSA) to then build the phylogenetic tree. then I get an error
strainphlan -s consensus_markers/*.pkl -m db_markers/s__Bacteroides_caccae.fna -r reference_genomes/G000273725.fna.bz2 -o output -n 8 -c s__Bacteroides_caccae --phylophlan_mode accurate --mutation_rates
Tue May 12 12:37:58 2020: Start StrainPhlAn 3.0 execution
Tue May 12 12:37:58 2020: Creating temporary directory...
Tue May 12 12:37:58 2020: Done.
Tue May 12 12:37:58 2020: Getting markers from main sample files...
Tue May 12 12:38:00 2020: Done.
Tue May 12 12:38:00 2020: Getting markers from main reference files...Warning: [blastn] Examining 5 or more matches is recommended
Tue May 12 12:38:17 2020: Done.
Tue May 12 12:38:17 2020: Removing bad markers / samples...
Tue May 12 12:38:17 2020: Done.
Tue May 12 12:38:17 2020: Writing samples as markers' FASTA files...
Tue May 12 12:38:18 2020: Done.
Tue May 12 12:38:18 2020: Writing filtered clade markers as FASTA file...
Tue May 12 12:38:18 2020: Done.
Tue May 12 12:38:18 2020: Calculating polymorphic rates...
Tue May 12 12:38:19 2020: Done.
Tue May 12 12:38:19 2020: Executing PhyloPhlAn 3.0...
Tue May 12 12:38:19 2020: Creating PhyloPhlAn 3.0 database...
Tue May 12 12:38:23 2020: Done.
Tue May 12 12:38:23 2020: Generating PhyloPhlAn 3.0 configuration file...
Tue May 12 12:38:24 2020: Done.
Tue May 12 12:38:24 2020: Processing samples...[e] unable to download "https://www.dropbox.com/s/x7cvma5bjzlllbt/phylophlan_databases.txt?dl=1"
[e] An error was ocurred executing a external tool, exiting...
Tue May 12 12:38:56 2020: Stop StrainPhlAn 3.0 execution.
I can’t download databases from wget, then I download this database from chrome. So my question is
Which folder should I put this database in?
my metaphlan3 version is MetaPhlAn version 3.0 (20 Mar 2020)
Hi, thanks for reporting this error.
Could you please send me the content of the “tmp” folder created on the output directory? PhyloPhlAn should detect a custom database inside this folder.
No, the problem you are experimenting is due PhyloPhlAn is not detecting the “output/tmp/s__Bacteroides_caccae” folder and then it is trying to download the default PhyloPhlAn DB. This looks as a permissions problem.
Using the filtered markers from the first part of the processing, StrainPhlAn creates that folder inside tmp before PhyloPhlan is called. Could you please sent me the full path to the “output/tmp/s__Bacteroides_caccae” folder and check the permissions of that folder? Are you using a virtual machine or a network shared folder?
Could you upgrade your phylophlan version to the v3.0.51?
$ conda install -c bioconda phylophlan
In the version 0.43, phylophlan will try, even if you specify a custom database, to download the “phylophlan_databases.txt” file. This is an step that cannot be avoided without modifying the code, and in your server it seems you have problems to download from Dropbox, so this means both strainphlan and phylophlan will fail all the time. However, in the last version (3.0.51) phylophlan first checks if you specify a database path and if not, it downloads the txt file, so I think this will solve your problem.
That is great!
Yes, that error was already reported and we will update the conda package ASAP. The main problem you will have without fixing the error is that the tmp folder will not be deleted at the end of the execution.
However, if you want to manually fix in your script, you can change the line for this:
"\nNumber of processes used: "+ str(nprocs) + "\n" )