Eggnog ID in uniref_eggnog but not map_eggnog_names

Samantha · August 19, 2020, 6:55pm

Hello,

I ran Humann3 on some metatranscriptomic samples using the UniRef90 db and then ran LEfSe on the table grouped by EggNog IDs (and others). I got three hits that don’t seem to have a name in the Eggnog mapping file and don’t find a hit on the EggNog website.

One example:
ENOG4105GMY, which can be found in the UniRef90_eggnog file:

zgrep ‘ENOG4105GMY’ map_eggnog_uniref90.txt.gz

This returns:
ENOG4105GMY UniRef90_A4EB58 UniRef90_A5Z326 UniRef90_A6P140 UniRef90_A7B6K2 UniRef90_A7VEM4 UniRef90_A8SIB4 UniRef90_A8SSN6 UniRef90_B0MGJ9 UniRef90_B2PU91 UniRef90_B5CUP9 UniRef90_B6G1S7 UniRef90_C0QQH9 UniRef90_C1DUU5 UniRef90_C4XFZ6 UniRef90_C9N1P8 UniRef90_E0NVY9 UniRef90_H6R3P7 UniRef90_Q2NJ05

But not the name mapping file:

zgrep ‘ENOG4105GMY’ map_eggnog_name.txt.gz

This command returns nothing.

The renaming command returns NO_NAME for these hits.

humann_rename_table -i no_unmapped_ungrouped/eggnog_cpm_genefamilies_joined_no_extra.tsv -n eggnog -o no_unmapped_ungrouped/renamed_eggnog_cpm_genefamilies_joined_no_extra.tsv

How do I find out what the names are? If there’s no way to directly find the name, is it possible to retrieve the sequence for that hit so I can BLAST it and try to determine it’s identity that way?

Thanks,
Samantha

franzosa · August 28, 2020, 5:59pm

We take the UniRef to eggNOG ID associations directly from UniProt, but we need to parse the eggNOG database to get the human-readable eggNOG names. It’s possible that the two are slightly out of sync, resulting in some UniProt-recognized eggNOGs that are no longer in the eggNOG database. One option would be to look at earlier versions of eggNOG to see if they are described there?

To get corresponding sequences, you can always take a UniRef entry like UniRef90_A4EB58 (from the mapping file) and look it up on UniProt like this:

https://www.uniprot.org/uniprot/A4EB58.fasta

to get a protein sequence. If you replace .fasta with .txt you’ll be shown the representative protein’s annotations rather than its sequence.

nick-youngblut · February 25, 2021, 3:53pm

UniRef to eggNOG ID associations directly from UniProt

Where on UniProt do you get this info from? I can’t find it on the ftp server (Index of /pub/databases/uniprot/current_release/knowledgebase)

franzosa · February 25, 2021, 4:15pm

We parse all functional annotations from these two files (they also have XML equivalents if you prefer):

https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz
https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.dat.gz

Which are concatenations of per-protein files that look like this:

https://www.uniprot.org/uniprot/P11440.txt

And specifically the DR lines that look like:

DR eggNOG; KOG0594; Eukaryota.

Note that UniRef90/50 are subsets of the sequences detailed in the above files. So, for example, the eggNOG annotation for UniRef90_XYZ is based on the entry for XYZ itself in the above files.

nick-youngblut · February 26, 2021, 7:08am

Awesome! Thanks for the detailed info!

Topic		Replies	Views
Problems with Eggnog renaming from grouped table HUMAnN	6	285	October 11, 2024
Eggnog version in humann_regroup_table function HUMAnN	1	330	April 21, 2023
Which database should I use to run `humann_regroup_table` and `humann_rename_table` command? HUMAnN	5	2733	October 19, 2020
Rename_table doesn't work with uniref90 HUMAnN	2	1266	August 3, 2020
No_name after humann_rename_table HUMAnN	1	22	October 30, 2024

Eggnog ID in uniref_eggnog but not map_eggnog_names

Related topics