Error when running extract_markers.py

Hi,

I have a problem when using extract_markers.py. Because the metaphlan database was not installed under the default directory, I used the command “-d” to assign the path of the database.
First, I tried “extract_markers.py -d /gpfs/data/lilab/home/zhoub03/software/metaphlan3_database/mpa_v30_CHOCOPhlAn_201901/mpa_v30_CHOCOPhlAn_201901.fna -c s__Bifidobacterium_breve -o clade_markers”. The error is

Sun Sep 6 16:01:52 2020: Start extract markers execution
Sun Sep 6 16:01:52 2020: Generating DB markers FASTA…
Sun Sep 6 16:03:46 2020: Done.
Sun Sep 6 16:03:46 2020: Loading MetaPhlan 3.0 database…Traceback (most recent call last):
File “/gpfs/share/apps/python/cpu/3.6.5/bin/extract_markers.py”, line 8, in
sys.exit(main())
File “/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/site-packages/metaphlan/utils/extract_markers.py”, line 135, in main
extract_markers(args.database, args.clade, args.output_dir)
File “/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/site-packages/metaphlan/utils/extract_markers.py”, line 98, in extract_markers
db = pickle.load(bz2.BZ2File(database))
File “/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/bz2.py”, line 172, in peek
return self._buffer.peek(n)
File “/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/_compression.py”, line 68, in readinto
data = self.read(len(byte_view))
File “/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/_compression.py”, line 103, in read
data = self._decompressor.decompress(rawblock, size)
OSError: Invalid data stream

It seems to indicate that I should use the zipped file. So I tried “extract_markers.py -d /gpfs/data/lilab/home/zhoub03/software/metaphlan3_database/mpa_v30_CHOCOPhlAn_201901/mpa_v30_CHOCOPhlAn_201901.fna.bz2 -c s__Bifidobacterium_breve -o clade_markers”. It reported

Sun Sep 6 16:04:19 2020: Start extract markers execution
Sun Sep 6 16:04:19 2020: Generating DB markers FASTA…Could not locate a Bowtie index corresponding to basename “/gpfs/data/lilab/home/zhoub03/software/metaphlan3_database/mpa_v30_CHOCOPhlAn_201901/mpa_v30_CHOCOPhlAn_201901.fna”
Error: Encountered internal Bowtie 2 exception (#1)
Command: /gpfs/share/apps/bowtie2/2.3.5.1/bin/bowtie2-inspect-s --wrapper basic-0 /gpfs/data/lilab/home/zhoub03/software/metaphlan3_database/mpa_v30_CHOCOPhlAn_201901/mpa_v30_CHOCOPhlAn_201901.fna

[e] An error was ocurred executing a external tool, exiting…
Sun Sep 6 16:04:19 2020: Stop StrainPhlAn 3.0 execution.

However, the bowtie index has been built under that directory with all six index files. I would appreciate it if you could tell me what command I should use.
Thank you so much!!!

Hi @Boyan
Thanks for getting in contact. The -d parameter of the StrainPhlAn scripts expect the path of the MetaPhlAn database PKL file rather than the FASTA file. In you case:
extract_markers.py -d /gpfs/data/lilab/home/zhoub03/software/metaphlan3_database/mpa_v30_CHOCOPhlAn_201901/mpa_v30_CHOCOPhlAn_201901.pkl -c s__Bifidobacterium_breve -o clade_markers

I hope this can solve your problem.
Best,
Aitor

1 Like

Hi @aitor.blancomiguez,

Thank you so much for your prompt reply! It solves my problem.
However, I have a new question in the next step. I followed the tutorial on this website “https://github.com/biobakery/biobakery/wiki/strainphlan3”.
This command “strainphlan -s consensus_markers/.pkl -m clade_markers/s__Eubacterium_rectale.fna -r reference_genomes/.fna -o output -n 8 -c s__Eubacterium_rectale --phylophlan_mode fast --nproc 4” requires some reference genomes.
For my cases, Bifidobacterium breve, do I need to collect reference genomes of Bifidobacterium breve by myself? How many sequences should I collect?

Thank you!!!

Hi @Boyan,
The execution of the StrainPhlAn command does not require any reference genome, you can execute it only with the reconstructed markers from the metagenomic samples. Optionally, you can add some ref. genomes to compare the strains reconstructed from your metagenomic samples against them.
Best,
Aitor

1 Like

That is great! Thank you!

Hi @aitor.blancomiguez,

I have tried “strainphlan -s consensus_markers/*.pkl -m clade_markers/s__Bifidobacterium_breve.fna -r /gpfs/data/lilab/home/zhoub03/software/my_strain2/Bifidobacterium_breve/Bifidobacterium_breve.fas -o output -n 2 -c s__Bifidobacterium_breve --phylophlan_mode fast”.
It reported "[e] The database does not exist
Wed Sep 9 12:47:42 2020: Stop StrainPhlAn 3.0 execution.
" However, I have checked the existence of these files.
I also tried “strainphlan -s consensus_markers/*.pkl -m clade_markers/s__Bifidobacterium_breve.fna -o output -n 2 -c s__Bifidobacterium_breve --phylophlan_mode fast” which does not include any reference genome. It still reported the same error.

I would appreciate it if you could help me figure out where is the problem. Thank you!

Hi @Boyan
The problem is related with the fact that your metaphlan database was not installed under the default directory. Try to execute strainphlan adding the parameter -d /gpfs/data/lilab/home/zhoub03/software/metaphlan3_database/mpa_v30_CHOCOPhlAn_201901/mpa_v30_CHOCOPhlAn_201901.pkl

Best,
Aitor

Hi @aitor.blancomiguez,
Thank you!
This command did solve the previous problem. But there was a new error when I ran
“strainphlan -s consensus_markers/*.pkl -m clade_markers/s__Bifidobacterium_breve.fna -d /gpfs/data/lilab/home/zhoub03/software/metaphlan3_database/mpa_v30_CHOCOPhlAn_201901/mpa_v30_CHOCOPhlAn_201901.pkl -r /gpfs/data/lilab/home/zhoub03/software/my_strain2/Bifidobacterium_breve/Bifidobacterium_breve.fas -o output -c s__Bifidobacterium_breve --phylophlan_mode accurate”.

I almost installed all required external tools. It reported "Generating PhyloPhlAn 3.0 configuration file…[e] could not find “raxml” (“None”) executable in your PATH environment variable
". However, I have tested the executability of “raxml” by running “raxmlHPC -h”. Why it still reported this error? The version of raxml is 8.2.12 and it was installed in the SSE3 version.
Thank you so much!!!

Hi Boyan, could you run the following command:
$ whereis raxmlHPC-PTHREADS-SSE3

Best,
Aitor

Hi Aitor,

Nothing happened after I ran this command. Does that mean I need to install another module or different version? Could you give me a link or tell me which command should I use to install it?

raxml was installed from this link “https://github.com/stamatak/standard-RAxML”. Now, only “raxmlHPC” and “raxmlHPC-MPI” were executable.

Best,
Boyan

Hi @Boyan
Yes, it looks like the Pthreads version was not properly installed, you could follow this manual for doing the complete installation: http://www.metagenomics.wiki/tools/phylogenetic-tree/construction/raxml/install

Best,
Aitor

Hi @aitor.blancomiguez,

Thank you! I have installed raxml properly. But there is a new error, "
Thu Oct 1 13:41:01 2020: Executing PhyloPhlAn 3.0…
Thu Oct 1 13:41:01 2020: Creating PhyloPhlAn 3.0 database…
Thu Oct 1 13:41:02 2020: Done.
Thu Oct 1 13:41:02 2020: Generating PhyloPhlAn 3.0 configuration file…
Thu Oct 1 13:41:02 2020: Done.
Thu Oct 1 13:41:02 2020: Processing samples…[e] “/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/site-packages/phylophlan/phylophlan_configs/” folder does not exists".

But I have installed phylophlan 3 and all prerequisites. The foler “/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/site-packages/phylophlan/” exists, but there is no folder named “phylophlan_configs” under it. Is this an installation problem? Or what should place in that folder?

Thank you!
Best,
Boyan

Hi @Boyan
Just creating the folder /gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/site-packages/phylophlan/phylophlan_configs/ will solve the problem.

Best,
Aitor

Hi @aitor.blancomiguez,

Thank you! This problem was solved. But there are still some errors. Below is the error record,
"
Mon Oct 5 13:25:32 2020: Start StrainPhlAn 3.0 execution
Mon Oct 5 13:25:32 2020: Creating temporary directory…
Mon Oct 5 13:25:32 2020: Done.
Mon Oct 5 13:25:32 2020: Getting markers from main sample files…
Mon Oct 5 13:25:32 2020: Done.
Mon Oct 5 13:25:32 2020: Getting markers from main reference files…
Mon Oct 5 13:25:33 2020: Done.
Mon Oct 5 13:25:33 2020: Removing bad markers / samples…
Mon Oct 5 13:25:33 2020: Done.
Mon Oct 5 13:25:33 2020: Writing samples as markers’ FASTA files…
Mon Oct 5 13:25:33 2020: Done.
Mon Oct 5 13:25:33 2020: Writing filtered clade markers as FASTA file…
Mon Oct 5 13:25:33 2020: Done.
Mon Oct 5 13:25:33 2020: Calculating polymorphic rates…
Mon Oct 5 13:25:33 2020: Done.
Mon Oct 5 13:25:33 2020: Executing PhyloPhlAn 3.0…
Mon Oct 5 13:25:33 2020: Creating PhyloPhlAn 3.0 database…
Mon Oct 5 13:25:34 2020: Done.
Mon Oct 5 13:25:34 2020: Generating PhyloPhlAn 3.0 configuration file…
Mon Oct 5 13:25:34 2020: Done.
Mon Oct 5 13:25:34 2020: Processing samples…
[e] No alignments found to concatenate

[e] An error was ocurred executing a external tool, exiting…
"
The following directories were generated under “tmp”, “blastn, clean_dna, map_dna, markers, markers_dna, msas, phylophlan.cfg, s__Bifidobacterium_breve, s__Bifidobacterium_breve.StrainPhlAn3, trim_not_variant”. But markers, msas, and trim_not_variant are empty. Do you know what might be the problem?

Thank you so much!

Best,
Boyan

Hi @Boyan, could you please send me the output of these commands:

  • metaphlan --version
  • phylophlan --version

Best,
Aitor

Hi @aitor.blancomiguez,

I used the following version (I got it by -v)
MetaPhlAn version 3.0 (20 Mar 2020)
PhyloPhlAn version 3.0.58 (8 September 2020)

Thank you!

Best,
Boyan

Hi @Boyan
In the last month we implemented few changes in both tools. It seems like the PhyloPhlAn version is updated but the MetaPhlAn (containing StrainPhlAn) looks a little bit old. Could you please retrieve the last [3.0.4] MetaPhlAn version and try again?

Best,
Aitor

Hi @aitor.blancomiguez

Thank you so much! I will ask our admin of the cluster to update it. Then I will try it again.

Best,
Boyan

Hi @aitor.blancomiguez,

It finally works without errors after it was updated, although I am not sure whether I could explain the results. Thank you so much for your patience!!!

Best,
Boyan

Hi @aitor.blancomiguez,

Thank you for your explanation on the StrainphIan, when I run and meet the same problem that in MetaPhlAn2 (installed by git clone, and the default python2.7.18) even after I have assigned the path of the db, it showed something like:

undefined symbol: “_Py_LegacyLocaleDetected”

Then, I have installed python3 by unzip mode, still encounter the same problem, then I
“conda activate python3”,
there is no error, but in the last step, it showed like this:

May I know where the pitfall lies. Thanks a lot!