It is great to see activity on the databases for metaphlan2 I’ve been using the newer db for my projects. Thank you for the tremendous efforts!
There have, however, been a steady stream of updates between v29 and v30. Its a challenge to manage the versions or know whether to rapidly adopt the latest and greatest.
Could you comment about the nature of the updates, the reasons for, and whether we should have concerns about past versions, or concerns about merging datasets that may span db versions? Maybe there is documentation somewhere that I’m missing?
Yes, there are a couple of intermediate databases between v2.9 and v3.0, they are all refinement of the v29 version to have the final v30 database. I’d advise not to use anymore versions before v295, they are still available but the profiling performance is poor compared to the latest version, and go directly to v30.
The new database is obtained with a new implementation of the ChocoPhlAn pipeline using UniProt as a source, with more species and an updated set of reference genomes (January 2019).
Results of profiles obtained with different versions of the database should not be merged together, the utility script
merge_metaphlan_tables also advises you when trying merging profiles obtained from different versions.