Hi,
We’ve been using HUMAnN 3.7 and recently switched to HUMAnN 4.0 alpha. After running the same set of samples through both versions, I noticed a significant reduction in the number of pathways detected by HUMAnN 4 alpha. I saw that the UniRef database used in v4 alpha is humann4_protein_database_filtered_v2019_06.dmnd, and I read on the forum that this version was filtered to include only sequences with level-4 EC annotations, including new EC assignments generated as part of this work. As noted in the forum: “For HUMAnN 4.0.0.alpha.1 specifically, the provided UniRef50/UniClust50 protein database has been filtered to only those sequences with a level-4 EC annotation… We plan to release an updated comprehensive protein database (similar to those offered with previous HUMAnN versions) alongside the official v4.0 release.”
I wonder if this filtering step might have removed lower-confidence genes, leading to fewer pathways being called overall. In this case, would using reaction-level profiling provide a more complete or stable functional overview during this transitional phase?
Also, do you have an expected timeline for the official release of HUMAnN 4 with the full protein database?
Thanks so much for your work on this tool!!
Best,
Fangxi
1 Like
Hmm… Nothing about filtering to EC-only annotated proteins should reduce the number of pathways as far as I can reason (indeed, we’ve actually improved the EC annotations a lot in HUMAnN 4 vs. 3, which should tend to improve pathway recovery).
As of HUMAnN 4 we’re now only reporting pathways that are composed purely of EC-based reactions. Pathways including 1+ reactions that are only annotated by MetaCyc (but don’t have EC equivalents) would no longer be reported. Is it possible this is what you’re seeing?
Other things to check would be how common the lost pathways were before - e.g. were they only found in 1-2 samples, or are you losing pathways that were highly prevalent before?
Hi @franzosa,
Thanks for your reply. I manually inspected some pathways of interest:
• Present in HUMAnN 4 alpha: SULFATE-CYS-PWY: superpathway of sulfate assimilation and cysteine biosynthesis
• Missing in HUMAnN 4 alpha but present in HUMAnN 3.7: PWY-6292: superpathway of L-cysteine biosynthesis (mammalian)
• Missing in HUMAnN 4 alpha but present in HUMAnN 3.7: PWY-801:homocysteine and cysteine interconversion
All were present in over 40% of my samples with HUMAnN 3.7.
For both PWY-6292 and PWY-801(please see attached screenshot), I reviewed the MetaCyc database website and found that all the reactions visually appear to be EC-annotated. Given your note that HUMAnN 4 now only reports pathways composed purely of EC-based reactions, I’m still unsure why these are excluded.
Could it be that the MetaCyc diagrams don’t show all reactions or substeps (some of which might lack EC annotations)? If so, is there a way to verify this more directly from your end?
In my HUMAnN 4 alpha run, I used:
humann_databases --download uniref uniref90_ec_filtered_diamond $INSTALL_LOCATION
…whereas in HUMAnN 3.7 I used the full UniRef90 DIAMOND database for translated search.
I understand from your comment that filtering to EC-only annotated proteins shouldn’t reduce pathway calls in theory, especially since EC annotations have improved. But I wonder: could using a more restricted protein set (even with better EC tagging) still result in some ECs not being recovered in practice, thereby causing certain pathways to fall below detection or completeness thresholds?
Thanks again!
Best regards,
Fangxi
It’s hard for me to imagine that we would be losing level-4 EC coverage going from v3 to v4 since we actually ADDED a bunch of new EC annotations for known and novel (MAG) proteins in v4. And the proteins from the full database in v3 now excluded from the v4 database would not have been EC annotated (then or now) by definition.
In your example noting EC 2.1.1.-, that could definitely be a case where it was quantified as a MetaCyc reaction before (by direct mapping) but missed now (since it isn’t specified to four EC levels). I’m more puzzled about the other example where all four reactions have level-4 EC annotation.
The next step I’d recommend is looking in the new reactions.tsv file that HUMAnN 4 outputs to see what it shows for stratified abundances for the reactions in these pathways. Some of them may be missing. HUMAnN 3 doesn’t directly output this data for comparison, but you can compute it manually using the regroup_table script on your genefamilies.tsv output and selecting the UniRef90 to MetaCyc-RXN map.