Hi,
I am thinking about a passway analysis using HUMAnN3. Would it be possible to get an intermediate file that tells us what and how many gene families are involved in each path of a pathway? It will be like an abundance table of gene families for each path.
Hello, There is a tool in the HUMAnN package named “humann_unpack_pathways”. If you run this tool it will generate a file of pathways stratified by the genes that are included in each.
Thanks!
Lauren
Thank you, Laren! It helps a lot. May I ask some follow-up questions?
- I am not sure if I understand how to get the pathway abundances from the gene abundances.
Here is an example. The following table is what I got for PWY-4203 from humann_unpack_pathways with a provided demo file.
PWY-4203: volatile benzenoid biosynthesis I (ester formation) 22.7546997826
PWY-4203 unclassified 22.7546997826
PWY-4203 unclassified UniRef90_O64988 14.1999417238
PWY-4203 unclassified UniRef90_Q8GT21 21.6993097732
PWY-4203 unclassified UniRef90_Q9SPV4 14.3476630042
PWY-4203 unclassified UniRef90_Q6XMI3 19.8996848181
From the file “metacyc_pathways”, I found that the pathway is associated with RXN-6724, RXN-6723, RXN-6722, and RXN-6762.
And, another file “metacyc_reactions_level4ec_only.uniref” tells me the reaction-gene family connections:
RXN-6724 and UniRef90_Q8GT21;
RXN-6723 and UniRef90_Q6XMI3, UniRef90_Q9SPV4;
RXN-6722 and UniRef90_Q6XMI3;
RXN-6762 and UniRef90_O64988.
So, the abundances of each reaction are [21.6993097732, 19.8996848181+14.3476630042, 19.8996848181, 14.1999417238]
The number of complete copies should be 14.1999417238, which is supposed to be the abundance of PWY-4203. However, it does not match the result on the table. What am I doing wrong?
-
The tutorial says pathways with zero abundance are not included in the file.
Does that mean, if any of the RXNs in a pathway has zero abundance, the pathway won’t be included ? I guess that is why only a very small number of gene families are detected in the pathway table.
-
I believe the abundance tables are based on the comparisons between reads and representative genes of each family. Do the tables reflect any exceptional cases like comparisons with non-representative genes? I was curious because I found that A0A174PVG2 (given as UniRef90_A0A174PVG2 in the demo gene family abundance table) is not actually a gene family but an element of the gene family, UniRef90_A0A5P3AUM1, according to the UniRef server online (UniRef - Cluster: Uncharacterized protein (90%)).