Count of individual genes from ChocoPhLan database rather than UniRef gene family based RPK

shubavarshini · December 15, 2020, 1:56pm

Hello,

I’m using HUMAnN2. The end result is RPK abudance of UniRef gene families (with taxa mention from the nucleotide search) and pathway abundances. I’m interested in looking at individual gene abundances (not necessarily raw counts, could be RPK or CPM). I mean the genes from the ChocoPhLan pangenome database. Is there any way that I can get this information while running HUMAnN2?

franzosa · December 15, 2020, 5:29pm

UniRef90s within microbial species are pretty close to individual genes (in a minority of cases a UniRef90 family might exist at multiple copy number within a species, in which case you’d see it having 2x, 3x, etc. the coverage of another gene). If you want to see the raw alignment of reads to genes within pangenomes you’d need to inspect the reads-vs-pangenome SAM file under the HUMAnN temp directory.

shubavarshini · January 8, 2021, 9:17am

Thank you for the reply. Following up on your reply, would you recommend any particular tool to count from the SAM file?
I want to use a tool that would more or less give the same count as HUMAnN after nucleotide mapping.

Topic		Replies	Views
Annotate predicted gene sequences HUMAnN	1	313	November 5, 2021
uniprotID mapping to pangenome genes HUMAnN	2	463	May 18, 2020
Different UniRef90 ID has the same nucleotide sequences in ChocoPhlAn database HUMAnN	3	531	August 4, 2020
Obtain $sample_genefamilies.tsv not normalized by RPKs HUMAnN	3	349	February 2, 2021
How to get count data (not RPK)? HUMAnN	1	88	August 27, 2025

Count of individual genes from ChocoPhLan database rather than UniRef gene family based RPK

Related topics