There have significant difference of abundance estimate using database_202103 or 202307

axolotl233 · January 23, 2024, 9:24am

Hi community,

I utilized MetaPhlAn4 (version 4.0.6, bowtie version 2.5.2) while employing two distinct standard database versions, namely CHOCOPhlAnSGB_202103 and CHOCOPhlAnSGB_202307. This analysis was performed on a mock sample comprising solely two identified bacteria: Escherichia coli and Limosilactobacillus reuteri, facilitating subsequent comparisons. Intriguingly, there exists an approximate 2% disparity in the relative abundance of each species between the two runs. Besides, the estimated_reads_mapped_to_known_clades of two runs have significant difference: 33363735 of newest database CHOCOPhlAnSGB_202307 but 39835344 of CHOCOPhlAnSGB_202103. I guess this is the main reason for the difference in abundance estimates.

The reason I’m doing this comparison is because, I want to estimate the E. coli abundance in my cohort. But strangely, when using the latest version of the database, no E. coli was detected in any of the samples, while when using the older version, low abundance E. coli was detected in the same samples cohort. I would like to know which result is closer to the real situation, do you have any suggestions? In addition, why is the estimated_reads_mapped_to_known_clades value of the latest version lower than that of the old version?

Kindly refer to the attached results for a more detailed overview.
Thanks,
Nemo
HEM-25.metaphlan.202103.txt (2.3 KB)
HEM-25.metaphlan.202307.txt (2.3 KB)

axolotl233 · February 27, 2024, 7:18am

please let me know if there have any progress?

aitor.blancomiguez · March 5, 2024, 1:03pm

Dear @axolotl233
From version to version, the set of marker genes for each species had change, and we expect an improvement with each version release in terms of accuracy and precision of the markers. That might explain the difference in the estimated number of reads mapped.

Topic		Replies	Views
Finding a specific older database species in newer database MetaPhlAn	0	21	August 20, 2025
Metaphlan2 trial result MetaPhlAn	4	419	February 4, 2021
Metaphlan4 discordance between relative_abundance and estimated_number_of_reads MetaPhlAn	6	761	October 26, 2022
Question about reproducing a previous metaphlan3 run MetaPhlAn	3	25	July 2, 2025
Deviation in Standard Reference Annotation Results of Metaphlan4 MetaPhlAn	1	46	September 24, 2024

There have significant difference of abundance estimate using database_202103 or 202307

Related topics