The bioBakery help forum

Error when running extract_markers.py

Hi,

I have a problem when using extract_markers.py. Because the metaphlan database was not installed under the default directory, I used the command “-d” to assign the path of the database.
First, I tried “extract_markers.py -d /gpfs/data/lilab/home/zhoub03/software/metaphlan3_database/mpa_v30_CHOCOPhlAn_201901/mpa_v30_CHOCOPhlAn_201901.fna -c s__Bifidobacterium_breve -o clade_markers”. The error is

Sun Sep 6 16:01:52 2020: Start extract markers execution
Sun Sep 6 16:01:52 2020: Generating DB markers FASTA…
Sun Sep 6 16:03:46 2020: Done.
Sun Sep 6 16:03:46 2020: Loading MetaPhlan 3.0 database…Traceback (most recent call last):
File “/gpfs/share/apps/python/cpu/3.6.5/bin/extract_markers.py”, line 8, in
sys.exit(main())
File “/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/site-packages/metaphlan/utils/extract_markers.py”, line 135, in main
extract_markers(args.database, args.clade, args.output_dir)
File “/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/site-packages/metaphlan/utils/extract_markers.py”, line 98, in extract_markers
db = pickle.load(bz2.BZ2File(database))
File “/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/bz2.py”, line 172, in peek
return self._buffer.peek(n)
File “/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/_compression.py”, line 68, in readinto
data = self.read(len(byte_view))
File “/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/_compression.py”, line 103, in read
data = self._decompressor.decompress(rawblock, size)
OSError: Invalid data stream

It seems to indicate that I should use the zipped file. So I tried “extract_markers.py -d /gpfs/data/lilab/home/zhoub03/software/metaphlan3_database/mpa_v30_CHOCOPhlAn_201901/mpa_v30_CHOCOPhlAn_201901.fna.bz2 -c s__Bifidobacterium_breve -o clade_markers”. It reported

Sun Sep 6 16:04:19 2020: Start extract markers execution
Sun Sep 6 16:04:19 2020: Generating DB markers FASTA…Could not locate a Bowtie index corresponding to basename “/gpfs/data/lilab/home/zhoub03/software/metaphlan3_database/mpa_v30_CHOCOPhlAn_201901/mpa_v30_CHOCOPhlAn_201901.fna”
Error: Encountered internal Bowtie 2 exception (#1)
Command: /gpfs/share/apps/bowtie2/2.3.5.1/bin/bowtie2-inspect-s --wrapper basic-0 /gpfs/data/lilab/home/zhoub03/software/metaphlan3_database/mpa_v30_CHOCOPhlAn_201901/mpa_v30_CHOCOPhlAn_201901.fna

[e] An error was ocurred executing a external tool, exiting…
Sun Sep 6 16:04:19 2020: Stop StrainPhlAn 3.0 execution.

However, the bowtie index has been built under that directory with all six index files. I would appreciate it if you could tell me what command I should use.
Thank you so much!!!

Hi @Boyan
Thanks for getting in contact. The -d parameter of the StrainPhlAn scripts expect the path of the MetaPhlAn database PKL file rather than the FASTA file. In you case:
extract_markers.py -d /gpfs/data/lilab/home/zhoub03/software/metaphlan3_database/mpa_v30_CHOCOPhlAn_201901/mpa_v30_CHOCOPhlAn_201901.pkl -c s__Bifidobacterium_breve -o clade_markers

I hope this can solve your problem.
Best,
Aitor

1 Like

Hi @aitor.blancomiguez,

Thank you so much for your prompt reply! It solves my problem.
However, I have a new question in the next step. I followed the tutorial on this website “https://github.com/biobakery/biobakery/wiki/strainphlan3”.
This command “strainphlan -s consensus_markers/.pkl -m clade_markers/s__Eubacterium_rectale.fna -r reference_genomes/.fna -o output -n 8 -c s__Eubacterium_rectale --phylophlan_mode fast --nproc 4” requires some reference genomes.
For my cases, Bifidobacterium breve, do I need to collect reference genomes of Bifidobacterium breve by myself? How many sequences should I collect?

Thank you!!!

Hi @Boyan,
The execution of the StrainPhlAn command does not require any reference genome, you can execute it only with the reconstructed markers from the metagenomic samples. Optionally, you can add some ref. genomes to compare the strains reconstructed from your metagenomic samples against them.
Best,
Aitor

1 Like

That is great! Thank you!

Hi @aitor.blancomiguez,

I have tried “strainphlan -s consensus_markers/*.pkl -m clade_markers/s__Bifidobacterium_breve.fna -r /gpfs/data/lilab/home/zhoub03/software/my_strain2/Bifidobacterium_breve/Bifidobacterium_breve.fas -o output -n 2 -c s__Bifidobacterium_breve --phylophlan_mode fast”.
It reported "[e] The database does not exist
Wed Sep 9 12:47:42 2020: Stop StrainPhlAn 3.0 execution.
" However, I have checked the existence of these files.
I also tried “strainphlan -s consensus_markers/*.pkl -m clade_markers/s__Bifidobacterium_breve.fna -o output -n 2 -c s__Bifidobacterium_breve --phylophlan_mode fast” which does not include any reference genome. It still reported the same error.

I would appreciate it if you could help me figure out where is the problem. Thank you!

Hi @Boyan
The problem is related with the fact that your metaphlan database was not installed under the default directory. Try to execute strainphlan adding the parameter -d /gpfs/data/lilab/home/zhoub03/software/metaphlan3_database/mpa_v30_CHOCOPhlAn_201901/mpa_v30_CHOCOPhlAn_201901.pkl

Best,
Aitor