NCycDb, MCycDB, dbCAN2

Hello! I’m wondering if it’s possible to add NCycDB, MCycDB and dbCAN2 databases as part of the HUMAnN pipeline? I’m trying to get a more curated list of gene families, that is more related to biogeochemical pathways to determine the influence of microbiomes on ecosystem functions.

It looks like NCycDB and MCycDb have fasta files available that could potentially be used in DIAMOND using this code ./diamond makedb --in reference.fasta -d reference, but I’m not so sure about dbCAN2. I would really appreciate if anybody has any advice for this, or experience doing this and could direct me to relevant answers.

Best wishes,
Sarah

The HUMAnN software can work with other sequence/enzyme/pathway databases, although we default to UniRef/MetaCyc bundled with the software. We’ve written text to help users use the software with other pathway systems here:

You would need to format a sequence database, have those sequences mapped to enzyme/reaction IDs, and then have those reaction IDs organized into pathways (optionally with structural logic, e.g. reaction A OR B is needed to move forward in the pathway). This is all described in more detail in the manual pages linked above.