Theoretical query regarding HUMAnN analysis

DEEPCHANDA7 · October 1, 2020, 8:55am

Hi @franzosa. I was reading the HUMAnN2 article (Franzosa et al. 2018, Nature Methods) and could not understand a particular concept. Can you please help me?
Here you have written:

“The tiered search generates mappings of meta’omic reads to gene
sequences with known or ambiguous taxonomy”

In the third tier, you do a translated search from the unmapped reads. And, to my knowledge, it gives you the protein that is encoded by that particular short read. I am not understanding how do you get the gene for that translated protein? Also, how do you get the length of the gene from that translated protein?

Thanks,
dpc

franzosa · October 2, 2020, 12:07pm

In the pangenome search, HUMAnN is mapping reads to genes that have been annotated to UniRef families. In the translated (i.e. DNA vs. protein) search, HUMAnN is mapping reads directly to the UniRef representative protein sequences. So we never work directly with the protein’s gene (i.e. DNA) sequence in the translated search, but we know its length is 3x the length of the corresponding protein for accounting purposes. Does this help to clarify things?

DEEPCHANDA7 · October 2, 2020, 12:23pm

Thanks a lot, @franzosa. Things are crystal clear to me now.

DC7

Topic		Replies	Views
Translated search on unclassified reads HUMAnN	1	323	June 28, 2022
uniprotID mapping to pangenome genes HUMAnN	2	463	May 18, 2020
Annotate predicted gene sequences HUMAnN	1	313	November 5, 2021
Getting 67% unaligned reads with HUMANnN 3.0 HUMAnN	9	2297	June 28, 2022
High proportion of Unmapped Uniref90 reads (and very few KOs after regroup) HUMAnN	1	620	August 3, 2020

Theoretical query regarding HUMAnN analysis

Related topics