Finding specific sequences in the database

Dear bioBakery team, thanks for all your excellent work. We are trying to compare different classification software with metaphlan4 (which is really excellent) and in order to align the different answers of the different tools we need to assess how some specific bacterial species report in other tools and for this we need to access the reference sequence. For a lot of them, we find them in the database (http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103.tar) but there are a few of them that we cannot find.

For instance
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Lachnospiraceae|g__Lachnospiraceae_unclassified|s__Eubacterium_rectale|t__SGB4933_group

We cannot find either this exact string in the database, or SGB4933 or s__Eubacterium_rectale. However this string was given by metaphlan4 so there is something we do not understand, and very afraid this is a mistake from us - sorry about that.

Raynald

Hi @delahondes
The identifiers of the marker genes might not correspond to the SGB reported in the profiles for the SGB_groups (they can be retrieved from one of the SGBs in the group that is not the group representative). However, you can find here (http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103_marker_info.txt.bz2) the association of each marker of the database to its correspondent SGB