Problem with adding a new marker to Metaphlan DB

Hi,
I am adding a new marker to the Metaphlan DB, and I’m following the tutorial reported here:

I’ve added a sequence called ‘prova_tetB_gilliamella’ inside a fasta file called ‘prova.gene.fa’ using these commands:

bowtie2-inspect mpa_db/mpa_v30_CHOCOPhlAn_201901 > mpa_db/mpa_v30_CHOCOPhlAn_201901_markers.fasta
cat DB/prova.gene.fa >> mpa_db/mpa_v30_CHOCOPhlAn_201901_markers.fasta
bowtie2-build mpa_db/mpa_v30_CHOCOPhlAn_201901_markers.fasta mpa_db/mpa_v30_CHOCOPhlAn_NEW

Subsequently, I’ve added the sequence inside the .pkl file using these commands:

import pickle
import bz2
db = pickle.load(bz2.open('mpa_db/mpa_v30_CHOCOPhlAn_201901.pkl', 'r'))
db['markers']['prova_tetB_gilliamella'] = {'ext': [], 'score': 0.0, 'clade': 's__Gilliamella_apicola_EXP','taxon': 'k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Orbales|f__Orbaceae|g__Gilliamellas__Gilliamella_apicola_EXP','len':1206}
# save
with bz2.BZ2File('mpa_db/mpa_v30_CHOCOPhlAn_NEW.pkl', 'w') as ofile:
pickle.dump(db, ofile, pickle.HIGHEST_PROTOCOL)

Then I started my analysis:

`metaphlan $f --bowtie2db mpa_db --index mpa_v30_CHOCOPhlAn_NEW --input_type fastq -s sams/${bn}.sam.bz2 --bowtie2out bowtie2/${bn}.bowtie2.bz2 -o profiles/${bn}_profile.tsv`

However I got this error:

Running metaphlan 3.0 on small_dataset/41290_ID1277_GC_18_S18_L001_R1_001.fastq.gz
Traceback (most recent call last):
File "/home/miniconda2/envs/mpa/bin/metaphlan", line 10, in <module>
sys.exit(main())
File "/home/miniconda2/envs/mpa/lib/python3.7/site-packages/metaphlan/metaphlan.py", line 1007, in main
tree = TaxTree( mpa_pkl, ignore_markers )
File "/home/miniconda2/envs/mpa/lib/python3.7/site-packages/metaphlan/metaphlan.py", line 679, in __init__
self.add_reads(k, 0)
File "/home/miniconda2/envs/mpa/lib/python3.7/site-packages/metaphlan/metaphlan.py", line 697, in add_reads
cl = self.all_clades[clade]
KeyError: 's__Gilliamella_apicola_EXP'

Why this is happening? It’s because I’ve added the “_EXP” string?

1 Like

The procedure is correct, you only forgot to create a new entry for 'k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Orbales|f__Orbaceae|g__Gilliamellas__Gilliamella_apicola_EXP' in the 'taxonomy' section.

This is the code snippet from the wiki:
# Add the taxonomy of the new genomes db['taxonomy']['7-levels taxonomy with clade names of genome1'] = ('7-levels NCBI taxonomy id of genome1', length of genome1)

How to make a fasta(Draft genome) to marker genes?

Is it correct if I add the entire genome to the database? I don’t know how to extract maker genes from genome

No, it is not possible to use the whole genome as a marker in MetaPhlAn. The procedure for extracting markers is described here and in the MetaPhlAn 3 preprint here