I am comparing results between MetaPhlAn 4 and 3.
Prevotella species seem to have changed substantially. Is there a change log somewhere explaining these changes? For example, a particular sample had:
Prevotella oris 28.9 %
Prevotella melaninogenica 12.1
Prevotella pallens 5.0 %
but now is:
Prevotella sp. 52.6 %
Prevotella oris 14.1%
Prevotella pallens 3.4%
Prevotella melaninogenica 1.0 %
What is the new Prevotella sp. category and why do the individual species continue to appear separately if this category contains all of them? I used
--tax_lev s in case it matters.
From MetaPhlAn 3 to MetaPhlAn 4, we have included important changes in the database (https://doi.org/10.1101/2022.08.22.504593). Together with an important increase in the amount of genomes in the database (1 million as the starting catalog), instead of purely rely on the NCBI taxonomy to define species, we have adopted the species-level genome bins (SGB) approach (Redirecting) to classify both genomes and MAGs. Thus, in version 4, the last taxonomic level is not anymore the species level (s__) but the SGB level (t__).
All this changes, can perfectly explain the differences in the profiles from version 3 (containing only 13.5k known species) to version 4 (containing 26,9k species, 4,992 of them taxonomically unidentified at the species level).
Prevotella sp., it reflects the taxonomy of the reference genomes in NCBI belonging to the SGB present in your sample (e.g. ASM2248306v1 - Genome - Assembly - NCBI) and thus it was propagated to the taxonomy in the MetaPhlAn 4 database.
Never mind. I got confused between sp. and spp. definitions and thought that the individual species should not be shown in the newer results (but actually sp. is just one species and not a large set of them). For the benefit of readers:
sp. is an abbreviation for one species. It is used when the actual species name cannot or need not or is not specified. The plural form of this abbreviation is spp. and indicates several species.
I was wrongly thinking about the definition of spp. when I saw sp. in my results. I am aware of the recent change to SGBs and I have some in my other results, such as GGB1022 SGB1316.