I’m trying to extend the current version (Jan2025) of the metaphlan so that the tool will estimate the abundances of two Bifidobacterium longum subspecies (subsp. longum and subsp. infantis). I’m following some published methodology here with code available here. I have successfully added the marker genes to the bowtie indexed database and updated the pkl file. My understanding is that the two subspecies are successfully added to the taxonomy and the marker genes are added to the markers with clades t__subsp.infantis and t__subsp.longum. Then the original B. longum clade/SGB (t__SGB17248) is then removed and the marker genes are assigned to a new clade (s__Bifidobacterium_longum).
However, when I run metaphlan4 the mapping is successful but I encounter an error with full traceback:
Traceback (most recent call last):
File "/opt/conda/bin/metaphlan", line 10, in <module>
sys.exit(main())
~~~~^^
File "/opt/conda/lib/python3.13/site-packages/metaphlan/metaphlan.py", line 2318, in main
metaphlan_runner.run_metaphlan()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/opt/conda/lib/python3.13/site-packages/metaphlan/metaphlan.py", line 1919, in run_metaphlan
self.metaphlan_analysis.report_results(self.tree, self.total_metagenome, self.avg_read_length)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.13/site-packages/metaphlan/metaphlan.py", line 1385, in report_results
self.get_mapped_fraction()
~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/opt/conda/lib/python3.13/site-packages/metaphlan/metaphlan.py", line 1267, in get_mapped_fraction
self.mapped = self.tree.relative_abundances()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/opt/conda/lib/python3.13/site-packages/metaphlan/metaphlan.py", line 309, in relative_abundances
clade.compute_coverage()
~~~~~~~~~~~~~~~~~~~~~~^^
File "/opt/conda/lib/python3.13/site-packages/metaphlan/metaphlan.py", line 179, in compute_coverage
self.coverage = sum([child.compute_coverage() for child in self.children.values()])
~~~~~~~~~~~~~~~~~~~~~~^^
File "/opt/conda/lib/python3.13/site-packages/metaphlan/metaphlan.py", line 179, in compute_coverage
self.coverage = sum([child.compute_coverage() for child in self.children.values()])
~~~~~~~~~~~~~~~~~~~~~~^^
File "/opt/conda/lib/python3.13/site-packages/metaphlan/metaphlan.py", line 179, in compute_coverage
self.coverage = sum([child.compute_coverage() for child in self.children.values()])
~~~~~~~~~~~~~~~~~~~~~~^^
[Previous line repeated 4 more times]
File "/opt/conda/lib/python3.13/site-packages/metaphlan/metaphlan.py", line 184, in compute_coverage
rat_nreads = self.filter_markers_for_ext()
File "/opt/conda/lib/python3.13/site-packages/metaphlan/metaphlan.py", line 142, in filter_markers_for_ext
ext_markers2nreads = self.taxa2clades[ext].markers2nreads
~~~~~~~~~~~~~~~~^^^^^
KeyError: 'SGB17248'
Obviously, at some point the clade (SGB17248) isn’t completely removed/reassigned from the database but I am struggling to find it (I can see the SGB ID is appended as part of the marker gene IDs but I have no idea whether this is used for anything). Is there anything in the error that will point to what I have missed in re-building the database?