Custom UniRef90 database with Humann3

dcdanko · January 28, 2021, 5:51pm

Hi,

I’m trying to use Humann3 with a custom version of uniref90 (nothing special, just a different release) but I keep getting empty results (0% unmapped and nothing else).

My workflow is a little unusual in that I am aligning reads to the database first then handing off the m8 file to humann2 to quantify pathways. Based on digging around through the docs/code it seems like I need an id-mapping table to tell humann2 how big the genes are, what species they belong to etc.

I have two questions:

Is my understanding correct? I’ve tried this out with a small fake id-mapping table and it seems to work.
Does such an id-mapping table already exist for uniref90? I’m not particularly concerned about the release version mismatch for this.

Thank you!

franzosa · January 29, 2021, 2:56pm

An id mapping file allows you to use HUMAnN with a truly custom database (with arbitrary sequence headers). Those headers would show up as targets in your m8 file, and the id mapping file would then allow you to associate the read mass they recruit with species + functions.

The alternative is to build a database whose sequence headers contain all that information, which in the case of HUMAnN’s built-in UniRef90 is just the UniRef90 ID + the DNA-equivalent sequence length (e.g. >UniRef90_ABC|300 for a protein ABC with length 100 amino acids).

dcdanko · January 29, 2021, 4:09pm

Makes sense. HUMAnN does not associate taxonomic info with UniRef entries by default right?

okeydokey · March 15, 2021, 9:16pm

the id mapping file would then allow you to associate the read mass they recruit with species + functions.

To clarify this, are the gene lengths used in RPK calculations pulled from the “length” attribute of the sequence header/id mapping file, or are they determined during the alignment steps ie in the Bowtie2 alignment results “Column 9: Observed template length”?

franzosa · March 15, 2021, 9:30pm

The former (from the header or id mapping file). Though in the case of SAM output the two ought to agree.

Topic		Replies	Views
Low number of EC IDs mapped from gene families in HUMANn3 HUMAnN	4	768	October 5, 2020
Running HUMAnN: pre-computed protein blastx M8 input HUMAnN	8	517	June 8, 2022
No UniRef90 IDs from Humann3 have information in UniProfKB site? HUMAnN	2	511	September 18, 2020
UniRef90 to UniRef50 conversion using HUMAnN3.0 HUMAnN	1	205	October 20, 2023
Humann_regroup_table for uniref90 KO HUMAnN	5	2051	June 29, 2022

Custom UniRef90 database with Humann3

Related topics