Theoretical query regarding HUMAnN analysis

Hi @franzosa. I was reading the HUMAnN2 article (Franzosa et al. 2018, Nature Methods) and could not understand a particular concept. Can you please help me?
Here you have written:

“The tiered search generates mappings of meta’omic reads to gene
sequences with known or ambiguous taxonomy”

In the third tier, you do a translated search from the unmapped reads. And, to my knowledge, it gives you the protein that is encoded by that particular short read. I am not understanding how do you get the gene for that translated protein? Also, how do you get the length of the gene from that translated protein?


In the pangenome search, HUMAnN is mapping reads to genes that have been annotated to UniRef families. In the translated (i.e. DNA vs. protein) search, HUMAnN is mapping reads directly to the UniRef representative protein sequences. So we never work directly with the protein’s gene (i.e. DNA) sequence in the translated search, but we know its length is 3x the length of the corresponding protein for accounting purposes. Does this help to clarify things?

Thanks a lot, @franzosa. Things are crystal clear to me now.