I have metaphlan output files of many samples, computed using MetaPhlAn4 v4.0.2 with mpa_vJan21_CHOCOPhlAnSGB_202103
database. For the same samples, I also have metaphlan results computed using MetaPhlAn4 v4.1.1 with mpa_vJun23_CHOCOPhlAnSGB_202403
.
I would like to compute the HACK-top-17-score, which is a combination of 17 species abundances (see paper), but this paper used MetaPhlAn3 with mpa_v30_CHOCOPhlAn_201901
.
15 of the 17 species from the paper had the exact same name in both newer databases. An additional one changed its name from s__Eubacterium_eligens to s__Lachnospira_eligens (thanks wiki).
But 1 out of the 17 species is missing from both newer databases: s__Oscillibacter_sp._57_20. I explored the newer databases files looking for it, also by NCBI taxid number (1897011) but didn’t find anything.
The most similar hits in mpa_vJan21_CHOCOPhlAnSGB_202103
are:
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Oscillospiraceae|g__Oscillibacter|s__Oscillibacter_SGB15077 k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Oscillospiraceae|g__Oscillibacter|s__Oscillibacter_ruminantium k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Oscillospiraceae|g__Oscillibacter|s__Oscillibacter_sp_ER4 k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Oscillospiraceae|g__Oscillibacter|s__Oscillibacter_sp_NSJ_62 k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Oscillospiraceae|g__Oscillibacter|s__Oscillibacter_sp_PC13
The most similar hits in mpa_vJun23_CHOCOPhlAnSGB_202403
are:
k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Oscillibacter|s__Oscillibacter_valericigenes k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Oscillibacter|s__Oscillibacter_sp_MSJ_31 k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Oscillibacter|s__Oscillibacter_sp_ER4 k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Oscillibacter|s__Oscillibacter_sp_PC13 k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Oscillibacter|s__Oscillibacter_hominis k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Oscillibacter|s__Oscillibacter_ruminantium
Which should I use? Maybe s__Oscillibacter_sp._57_20 includes more than one species in the newer databases? It is actually a pretty researched bug (google search results) so it’s weird for me that it does not exist at all in the mpa_vJan21_CHOCOPhlAnSGB_202103
and mpa_vJun23_CHOCOPhlAnSGB_202403
databases.
Thanks in advance