I’ve recently come across mentions of updated HUMAnN ChocoPhlAn/UniRef databases (201901b vs 201901), but haven’t been able to find any posts describing the update, and what has now changed in each database file. Could you please provide a brief synopsis of the update, and whether or not you would recommend re-running metagenomic samples previously run using the 201901 versions of these databases?
Thanks for the reminder on this. I just posted not-so-brief release notes on HUMAnN 3.0.0 and the 201901b database update here:
Whether or not I would rerun a dataset would depend on the cost of the compute (which ends up being subjective). Personally, if I was working with a few hundred samples I would probably justify the rerun, but not for a few thousand samples. That might depend further on whether or not you had a species of interest among the ~600 new pangenomes added with this update.
Some of the other changes in this update (e.g. MetaCyc 24.0, the new UniRef mappings, changes to infer_taxonomy) could all be (re)computed quickly from gene family abundance profiles generated by the previous HUMAnN 3 – whether or not you want to repeat the gene family quantification would be the tougher question.
With the database updates is the mpa_v30_CHOCOPhlAn_201901_marker_info.txt.bz2 file still the most up to date marker info file, or is there a “b” counterpart to this file as well?
No changes to the MetaPhlAn markers for this release. The 600 new pangenomes in ChocoPhlAn 201901b were already quantifiable by MetaPhlAn 3’s 201901 markers but not available as pangenomes due to a synchronization issue between UniProt and GenBank when we built the original batch.