I have a problem when using extract_markers.py. Because the metaphlan database was not installed under the default directory, I used the command “-d” to assign the path of the database.
First, I tried “extract_markers.py -d /gpfs/data/lilab/home/zhoub03/software/metaphlan3_database/mpa_v30_CHOCOPhlAn_201901/mpa_v30_CHOCOPhlAn_201901.fna -c s__Bifidobacterium_breve -o clade_markers”. The error is
Sun Sep 6 16:01:52 2020: Start extract markers execution
Sun Sep 6 16:01:52 2020: Generating DB markers FASTA…
Sun Sep 6 16:03:46 2020: Done.
Sun Sep 6 16:03:46 2020: Loading MetaPhlan 3.0 database…Traceback (most recent call last):
File “/gpfs/share/apps/python/cpu/3.6.5/bin/extract_markers.py”, line 8, in
sys.exit(main())
File “/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/site-packages/metaphlan/utils/extract_markers.py”, line 135, in main
extract_markers(args.database, args.clade, args.output_dir)
File “/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/site-packages/metaphlan/utils/extract_markers.py”, line 98, in extract_markers
db = pickle.load(bz2.BZ2File(database))
File “/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/bz2.py”, line 172, in peek
return self._buffer.peek(n)
File “/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/_compression.py”, line 68, in readinto
data = self.read(len(byte_view))
File “/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/_compression.py”, line 103, in read
data = self._decompressor.decompress(rawblock, size)
OSError: Invalid data stream
It seems to indicate that I should use the zipped file. So I tried “extract_markers.py -d /gpfs/data/lilab/home/zhoub03/software/metaphlan3_database/mpa_v30_CHOCOPhlAn_201901/mpa_v30_CHOCOPhlAn_201901.fna.bz2 -c s__Bifidobacterium_breve -o clade_markers”. It reported
Sun Sep 6 16:04:19 2020: Start extract markers execution
Sun Sep 6 16:04:19 2020: Generating DB markers FASTA…Could not locate a Bowtie index corresponding to basename “/gpfs/data/lilab/home/zhoub03/software/metaphlan3_database/mpa_v30_CHOCOPhlAn_201901/mpa_v30_CHOCOPhlAn_201901.fna”
Error: Encountered internal Bowtie 2 exception (#1)
Command: /gpfs/share/apps/bowtie2/2.3.5.1/bin/bowtie2-inspect-s --wrapper basic-0 /gpfs/data/lilab/home/zhoub03/software/metaphlan3_database/mpa_v30_CHOCOPhlAn_201901/mpa_v30_CHOCOPhlAn_201901.fna
[e] An error was ocurred executing a external tool, exiting…
Sun Sep 6 16:04:19 2020: Stop StrainPhlAn 3.0 execution.
However, the bowtie index has been built under that directory with all six index files. I would appreciate it if you could tell me what command I should use.
Thank you so much!!!
Hi @Boyan
Thanks for getting in contact. The -d parameter of the StrainPhlAn scripts expect the path of the MetaPhlAn database PKL file rather than the FASTA file. In you case: extract_markers.py -d /gpfs/data/lilab/home/zhoub03/software/metaphlan3_database/mpa_v30_CHOCOPhlAn_201901/mpa_v30_CHOCOPhlAn_201901.pkl -c s__Bifidobacterium_breve -o clade_markers
Thank you so much for your prompt reply! It solves my problem.
However, I have a new question in the next step. I followed the tutorial on this website “https://github.com/biobakery/biobakery/wiki/strainphlan3”.
This command “strainphlan -s consensus_markers/.pkl -m clade_markers/s__Eubacterium_rectale.fna -r reference_genomes/.fna -o output -n 8 -c s__Eubacterium_rectale --phylophlan_mode fast --nproc 4” requires some reference genomes.
For my cases, Bifidobacterium breve, do I need to collect reference genomes of Bifidobacterium breve by myself? How many sequences should I collect?
Hi @Boyan,
The execution of the StrainPhlAn command does not require any reference genome, you can execute it only with the reconstructed markers from the metagenomic samples. Optionally, you can add some ref. genomes to compare the strains reconstructed from your metagenomic samples against them.
Best,
Aitor
I have tried “strainphlan -s consensus_markers/*.pkl -m clade_markers/s__Bifidobacterium_breve.fna -r /gpfs/data/lilab/home/zhoub03/software/my_strain2/Bifidobacterium_breve/Bifidobacterium_breve.fas -o output -n 2 -c s__Bifidobacterium_breve --phylophlan_mode fast”.
It reported "[e] The database does not exist Wed Sep 9 12:47:42 2020: Stop StrainPhlAn 3.0 execution.
" However, I have checked the existence of these files.
I also tried “strainphlan -s consensus_markers/*.pkl -m clade_markers/s__Bifidobacterium_breve.fna -o output -n 2 -c s__Bifidobacterium_breve --phylophlan_mode fast” which does not include any reference genome. It still reported the same error.
I would appreciate it if you could help me figure out where is the problem. Thank you!
Hi @Boyan
The problem is related with the fact that your metaphlan database was not installed under the default directory. Try to execute strainphlan adding the parameter -d /gpfs/data/lilab/home/zhoub03/software/metaphlan3_database/mpa_v30_CHOCOPhlAn_201901/mpa_v30_CHOCOPhlAn_201901.pkl
Hi @aitor.blancomiguez,
Thank you!
This command did solve the previous problem. But there was a new error when I ran
“strainphlan -s consensus_markers/*.pkl -m clade_markers/s__Bifidobacterium_breve.fna -d /gpfs/data/lilab/home/zhoub03/software/metaphlan3_database/mpa_v30_CHOCOPhlAn_201901/mpa_v30_CHOCOPhlAn_201901.pkl -r /gpfs/data/lilab/home/zhoub03/software/my_strain2/Bifidobacterium_breve/Bifidobacterium_breve.fas -o output -c s__Bifidobacterium_breve --phylophlan_mode accurate”.
I almost installed all required external tools. It reported "Generating PhyloPhlAn 3.0 configuration file…[e] could not find “raxml” (“None”) executable in your PATH environment variable
". However, I have tested the executability of “raxml” by running “raxmlHPC -h”. Why it still reported this error? The version of raxml is 8.2.12 and it was installed in the SSE3 version.
Thank you so much!!!
Nothing happened after I ran this command. Does that mean I need to install another module or different version? Could you give me a link or tell me which command should I use to install it?
Thank you! I have installed raxml properly. But there is a new error, "
Thu Oct 1 13:41:01 2020: Executing PhyloPhlAn 3.0…
Thu Oct 1 13:41:01 2020: Creating PhyloPhlAn 3.0 database…
Thu Oct 1 13:41:02 2020: Done.
Thu Oct 1 13:41:02 2020: Generating PhyloPhlAn 3.0 configuration file…
Thu Oct 1 13:41:02 2020: Done.
Thu Oct 1 13:41:02 2020: Processing samples…[e] “/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/site-packages/phylophlan/phylophlan_configs/” folder does not exists".
But I have installed phylophlan 3 and all prerequisites. The foler “/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/site-packages/phylophlan/” exists, but there is no folder named “phylophlan_configs” under it. Is this an installation problem? Or what should place in that folder?
Hi @Boyan
Just creating the folder /gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/site-packages/phylophlan/phylophlan_configs/ will solve the problem.
Thank you! This problem was solved. But there are still some errors. Below is the error record,
"
Mon Oct 5 13:25:32 2020: Start StrainPhlAn 3.0 execution
Mon Oct 5 13:25:32 2020: Creating temporary directory…
Mon Oct 5 13:25:32 2020: Done.
Mon Oct 5 13:25:32 2020: Getting markers from main sample files…
Mon Oct 5 13:25:32 2020: Done.
Mon Oct 5 13:25:32 2020: Getting markers from main reference files…
Mon Oct 5 13:25:33 2020: Done.
Mon Oct 5 13:25:33 2020: Removing bad markers / samples…
Mon Oct 5 13:25:33 2020: Done.
Mon Oct 5 13:25:33 2020: Writing samples as markers’ FASTA files…
Mon Oct 5 13:25:33 2020: Done.
Mon Oct 5 13:25:33 2020: Writing filtered clade markers as FASTA file…
Mon Oct 5 13:25:33 2020: Done.
Mon Oct 5 13:25:33 2020: Calculating polymorphic rates…
Mon Oct 5 13:25:33 2020: Done.
Mon Oct 5 13:25:33 2020: Executing PhyloPhlAn 3.0…
Mon Oct 5 13:25:33 2020: Creating PhyloPhlAn 3.0 database…
Mon Oct 5 13:25:34 2020: Done.
Mon Oct 5 13:25:34 2020: Generating PhyloPhlAn 3.0 configuration file…
Mon Oct 5 13:25:34 2020: Done. Mon Oct 5 13:25:34 2020: Processing samples… [e] No alignments found to concatenate
[e] An error was ocurred executing a external tool, exiting…
"
The following directories were generated under “tmp”, “blastn, clean_dna, map_dna, markers, markers_dna, msas, phylophlan.cfg, s__Bifidobacterium_breve, s__Bifidobacterium_breve.StrainPhlAn3, trim_not_variant”. But markers, msas, and trim_not_variant are empty. Do you know what might be the problem?
Hi @Boyan
In the last month we implemented few changes in both tools. It seems like the PhyloPhlAn version is updated but the MetaPhlAn (containing StrainPhlAn) looks a little bit old. Could you please retrieve the last [3.0.4] MetaPhlAn version and try again?
It finally works without errors after it was updated, although I am not sure whether I could explain the results. Thank you so much for your patience!!!
Thank you for your explanation on the StrainphIan, when I run and meet the same problem that in MetaPhlAn2 (installed by git clone, and the default python2.7.18) even after I have assigned the path of the db, it showed something like:
undefined symbol: “_Py_LegacyLocaleDetected”
Then, I have installed python3 by unzip mode, still encounter the same problem, then I
“conda activate python3”,
there is no error, but in the last step, it showed like this: