Add ncbi TaxID to Humann3 output

Dear Humman developers,

Thanks for the fantastic tool.

I have one question concerning Humann3 output. Please Is it possible to add the ncbi TAXID to the genefamilies.tsv output here below? if yes have you an idea for the best way to do it

Gene Family demo_Abundance-RPKs

UNMAPPED 17401.0000000000
UniRef90_G1UL42 333.3333333333
UniRef90_G1UL42|g__Bacteroides.s__Bacteroides_dorei 333.3333333333
UniRef90_I9QXW8 333.3333333333
UniRef90_I9QXW8|g__Bacteroides.s__Bacteroides_dorei 333.3333333333
UniRef90_A0A078RDY6 166.6666666667

I searched a bit but I didn’t find anything.

Thanks a lot for your help

Yao

We don’t have a tool that will do this currently, but the information you need IS bundled with HUMAnN. Specifically the tol-lca file that comes with the utility mapping database (as used by the infer_taxonomy script). This file lists three NCBI TAXIDs for each UniRef90/50 cluster:

  1. The ID of the source of the representative sequence
  2. The ID of the true LCA of the cluster
  3. The “HUMAnN LCA” (which builds in some tolerance to outliers)

Hope this helps!