The bioBakery help forum

Count of individual genes from ChocoPhLan database rather than UniRef gene family based RPK


I’m using HUMAnN2. The end result is RPK abudance of UniRef gene families (with taxa mention from the nucleotide search) and pathway abundances. I’m interested in looking at individual gene abundances (not necessarily raw counts, could be RPK or CPM). I mean the genes from the ChocoPhLan pangenome database. Is there any way that I can get this information while running HUMAnN2?

UniRef90s within microbial species are pretty close to individual genes (in a minority of cases a UniRef90 family might exist at multiple copy number within a species, in which case you’d see it having 2x, 3x, etc. the coverage of another gene). If you want to see the raw alignment of reads to genes within pangenomes you’d need to inspect the reads-vs-pangenome SAM file under the HUMAnN temp directory.

Thank you for the reply. Following up on your reply, would you recommend any particular tool to count from the SAM file?
I want to use a tool that would more or less give the same count as HUMAnN after nucleotide mapping.