Hi,
I am adding a new marker to the Metaphlan DB, and I’m following the tutorial reported here:
I’ve added a sequence called ‘prova_tetB_gilliamella’ inside a fasta file called ‘prova.gene.fa’ using these commands:
bowtie2-inspect mpa_db/mpa_v30_CHOCOPhlAn_201901 > mpa_db/mpa_v30_CHOCOPhlAn_201901_markers.fasta
cat DB/prova.gene.fa >> mpa_db/mpa_v30_CHOCOPhlAn_201901_markers.fasta
bowtie2-build mpa_db/mpa_v30_CHOCOPhlAn_201901_markers.fasta mpa_db/mpa_v30_CHOCOPhlAn_NEW
Subsequently, I’ve added the sequence inside the .pkl file using these commands:
import pickle
import bz2
db = pickle.load(bz2.open('mpa_db/mpa_v30_CHOCOPhlAn_201901.pkl', 'r'))
db['markers']['prova_tetB_gilliamella'] = {'ext': [], 'score': 0.0, 'clade': 's__Gilliamella_apicola_EXP','taxon': 'k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Orbales|f__Orbaceae|g__Gilliamellas__Gilliamella_apicola_EXP','len':1206}
# save
with bz2.BZ2File('mpa_db/mpa_v30_CHOCOPhlAn_NEW.pkl', 'w') as ofile:
pickle.dump(db, ofile, pickle.HIGHEST_PROTOCOL)
Then I started my analysis:
`metaphlan $f --bowtie2db mpa_db --index mpa_v30_CHOCOPhlAn_NEW --input_type fastq -s sams/${bn}.sam.bz2 --bowtie2out bowtie2/${bn}.bowtie2.bz2 -o profiles/${bn}_profile.tsv`
However I got this error:
Running metaphlan 3.0 on small_dataset/41290_ID1277_GC_18_S18_L001_R1_001.fastq.gz
Traceback (most recent call last):
File "/home/miniconda2/envs/mpa/bin/metaphlan", line 10, in <module>
sys.exit(main())
File "/home/miniconda2/envs/mpa/lib/python3.7/site-packages/metaphlan/metaphlan.py", line 1007, in main
tree = TaxTree( mpa_pkl, ignore_markers )
File "/home/miniconda2/envs/mpa/lib/python3.7/site-packages/metaphlan/metaphlan.py", line 679, in __init__
self.add_reads(k, 0)
File "/home/miniconda2/envs/mpa/lib/python3.7/site-packages/metaphlan/metaphlan.py", line 697, in add_reads
cl = self.all_clades[clade]
KeyError: 's__Gilliamella_apicola_EXP'
Why this is happening? It’s because I’ve added the “_EXP” string?