How to do CAZy gene profiling?

Hello, bioBakery member,

Can I get CAZy gene abundances from the genefamilies abundances generated by HUMMAnN ?

I found that there are some top articles published in Nature, and Cell that talk about CAZyme, but there does not seem to be a unified CAZy gene profiling method. Some of them directly map the reads to the CAZy database, and some use the constructed gene set to calculate gene profile firstly, and then convert it to CAZy gene profile based on the CAZy annotation of gene.

If we can directly convert from genefamilies to CAZy profile, it might be more straightforward.

If there is something wrong with my understanding, thank you for your correction.

1 Like

Here’s a link to a CAZy-to-UniRef90 mapping file that is compatible with HUMAnN’s regroup_table script (pass it using the -c flag for a custom mapping). I was never able to find a good mapping from CAZy IDs to names, hence this was not bundled with the official HUMAnN installation. If you find a name mapping, please share it with us! :slight_smile:

4 Likes

@franzosa Hi, many thanks for your answer.
I will try this method.
If I find a better name mapping, I will share it with you, it will be fun.

hi @alienzj,

Do you have any update on this?

Thanks

Hi Franzosa! How did you find/make this? This really helps so thank you for sharing! I would like to find a similar file for NCyc and MCyc databases, do you know where these would be available?

Cheers!

All mapping files are derived from raw protein annotations downloaded from UniProt. See the DR (database cross-reference) lines in the raw text representation of a UniProt entry for an example:

https://rest.uniprot.org/uniprotkb/Q5NMT2.txt

Hi @franzosa! I have recently downloaded the uniref90-to-cazy map you shared here and I was wondering how you built this file. Is this part of any publication I can cite in the future? Because HumanN does not provide it as part of its utility mapping files, right?

Kind regards,
Felipe

This file was generated using the same methods / raw UniProt 2019_01 data we used to build the other mapping files for HUMAnN 3, as described in:

So you could cite that paper + UniProt itself and I think that would suffice.

1 Like

Thank you very much!

Hello,

I am trying to apply the custom CAZy mapping. I have realized that the humann_regroup_table method can be applied only on the genefamilies.tsv output. However, I want to receive the bugs_list.tsv as my final output. I have tried using the merge_metaphlan_tables.py script, but I think this already requires “final” profiles. So, my question is what is the exact pipeline to apply the CAZy custom mapping and receive the merged bugs list output from multiple samples?

Thank you!