How to do CAZy gene profiling?

Hello, bioBakery member,

Can I get CAZy gene abundances from the genefamilies abundances generated by HUMMAnN ?

I found that there are some top articles published in Nature, and Cell that talk about CAZyme, but there does not seem to be a unified CAZy gene profiling method. Some of them directly map the reads to the CAZy database, and some use the constructed gene set to calculate gene profile firstly, and then convert it to CAZy gene profile based on the CAZy annotation of gene.

If we can directly convert from genefamilies to CAZy profile, it might be more straightforward.

If there is something wrong with my understanding, thank you for your correction.

1 Like

Here’s a link to a CAZy-to-UniRef90 mapping file that is compatible with HUMAnN’s regroup_table script (pass it using the -c flag for a custom mapping). I was never able to find a good mapping from CAZy IDs to names, hence this was not bundled with the official HUMAnN installation. If you find a name mapping, please share it with us! :slight_smile:


@franzosa Hi, many thanks for your answer.
I will try this method.
If I find a better name mapping, I will share it with you, it will be fun.

hi @alienzj,

Do you have any update on this?


Hi Franzosa! How did you find/make this? This really helps so thank you for sharing! I would like to find a similar file for NCyc and MCyc databases, do you know where these would be available?


All mapping files are derived from raw protein annotations downloaded from UniProt. See the DR (database cross-reference) lines in the raw text representation of a UniProt entry for an example: