Finding a specific older database species in newer database

I have metaphlan output files of many samples, computed using MetaPhlAn4 v4.0.2 with mpa_vJan21_CHOCOPhlAnSGB_202103 database. For the same samples, I also have metaphlan results computed using MetaPhlAn4 v4.1.1 with mpa_vJun23_CHOCOPhlAnSGB_202403.

I would like to compute the HACK-top-17-score, which is a combination of 17 species abundances (see paper), but this paper used MetaPhlAn3 with mpa_v30_CHOCOPhlAn_201901.

15 of the 17 species from the paper had the exact same name in both newer databases. An additional one changed its name from s__Eubacterium_eligens to s__Lachnospira_eligens (thanks wiki).

But 1 out of the 17 species is missing from both newer databases: s__Oscillibacter_sp._57_20. I explored the newer databases files looking for it, also by NCBI taxid number (1897011) but didn’t find anything.

The most similar hits in mpa_vJan21_CHOCOPhlAnSGB_202103 are:
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Oscillospiraceae|g__Oscillibacter|s__Oscillibacter_SGB15077 k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Oscillospiraceae|g__Oscillibacter|s__Oscillibacter_ruminantium k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Oscillospiraceae|g__Oscillibacter|s__Oscillibacter_sp_ER4 k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Oscillospiraceae|g__Oscillibacter|s__Oscillibacter_sp_NSJ_62 k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Oscillospiraceae|g__Oscillibacter|s__Oscillibacter_sp_PC13

The most similar hits in mpa_vJun23_CHOCOPhlAnSGB_202403 are:
k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Oscillibacter|s__Oscillibacter_valericigenes k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Oscillibacter|s__Oscillibacter_sp_MSJ_31 k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Oscillibacter|s__Oscillibacter_sp_ER4 k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Oscillibacter|s__Oscillibacter_sp_PC13 k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Oscillibacter|s__Oscillibacter_hominis k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Oscillibacter|s__Oscillibacter_ruminantium

Which should I use? Maybe s__Oscillibacter_sp._57_20 includes more than one species in the newer databases? It is actually a pretty researched bug (google search results) so it’s weird for me that it does not exist at all in the mpa_vJan21_CHOCOPhlAnSGB_202103 and mpa_vJun23_CHOCOPhlAnSGB_202403 databases.

Thanks in advance :slight_smile: