Dear bioBakery forum, hello there!
I decided to create a new topic to share a clean, detailed report of my experience — hoping it might help other researchers navigating the same issue.
 Aim
 Aim
I was looking for a way to connect Pathways to UniRef90 Gene Families IDs, and UniRef90 Gene Families IDs to Gene Families, with the ultimate goal of linking Pathways directly to Gene Families.
 Issue
 Issue
Although HUMAnN provides mapping files to link UniRef90 gene families IDs to gene families (e.g., map_$db_uniref90.txt files found in the utility_mapping folder, downloadable via:
humann_databases --download utility_mapping full $DIR
), I noticed there was no file that directly links UniRef90 Gene Families IDs to Pathways.
I found a 2019 post suggesting the use of the humann2_unpack_pathways command. Since I am using HUMAnN v3.9, I looked for the updated tool and found humann_unpack_pathways. I ran it as follows:
humann_unpack_pathways \
  --input-genes humann_genefamilies.tsv \
  --input-pathways humann_pathabundance.tsv \
  --remove-taxonomy \
  --output humann_unpacked.tsv
 Hint
 Hint
As noted in its tool description, humann_unpack_pathways only outputs pathways for which UniRef90 Gene Families IDs are present in your humann_genefamilies.tsv. This means some pathways may be missing if no corresponding UniRef90 Gene Families IDs were detected.
 Related Work
 Related Work
A similar question was raised in this 2021 forum post, but as of today, the referenced files are no longer available and the shared links are broken.
 Workaround
 Workaround
I developed a method to link UniRef90 Gene Families IDs to pathways, and even trace back individual gene families by passing through UniRef90 Gene Families IDs. This fulfilled my aim of linking Pathways → UniRef90 Gene Families IDs → Gene Families.
 What You’ll Need
 What You’ll Need
- 
humann_unpacked.tsv(fromhumann_unpack_pathways)
- 
A list of pathways you’re interested in 
 → e.g.,humann_pathabundance_stratified.tsv, or a custom subset (I usedsignificant_pathab.rds)
- 
The following utility_mappingfiles:- map_ko_uniref90.txt
- map_go_uniref90.txt
- map_pfam_uniref90.txt
- map_eggnog_uniref90.txt
- map_level4ec_uniref90.txt
 
 Note: There’s no mapping file for MetaCyc reactions directly. However, because they share EC numbers, you can repurpose
 Note: There’s no mapping file for MetaCyc reactions directly. However, because they share EC numbers, you can repurpose map_level4ec_uniref90.txt to map MetaCyc reactions as well.
 The Function
 The Function
Here’s the R function I wrote to perform the full linkage and output CSV files connecting pathways, UniRef90 gene families IDs, and functional gene family annotations.
# Load required libraries
library(dplyr)
library(readr)
library(stringr)
library(tidyr)
link_pathways_to_genefamilies <- function(
    base_dir,
    sig_pathab_rds = "results/significant_pathab.rds",
    mapping_names = c("KO", "GO", "Pfam", "EggNOG", "Level4EC"),
    mapping_prefix = "map_",
    uniref_suffix = "_uniref90.txt"
) {
  humann_unpacked_file <- file.path(base_dir, "humann_unpacked.tsv")
  map_dir <- file.path(base_dir, "utility_mapping")
  results_folder <- file.path(base_dir, "results")
  sig_pathab_rds <- file.path(base_dir, sig_pathab_rds)
  
  if (!file.exists(sig_pathab_rds)) stop("Significant pathway RDS not found: ", sig_pathab_rds)
  load(sig_pathab_rds)  # should load object `sp_pathab`
  if (!exists("sp_pathab")) stop("Object `sp_pathab` not found.")
  
  sig_pathways <- sp_pathab %>%
    mutate(PathwayID = str_extract(GeneFamily, "^[^:]+")) %>%
    distinct(PathwayID)
  
  if (!file.exists(humann_unpacked_file)) stop("HUMAnN unpacked file not found.")
  humann_unpacked <- read_tsv(
    humann_unpacked_file,
    col_types = cols_only(`# Pathway` = col_character()),
    col_names = TRUE
  ) %>% rename(PathwayRaw = `# Pathway`)
  
  pathway_gene_pairs <- humann_unpacked %>%
    mutate(
      PathwayID = str_extract(PathwayRaw, "^[^:|]+"),
      GeneFamily = if_else(
        str_detect(PathwayRaw, "\\|"),
        str_split(PathwayRaw, "\\|", simplify = TRUE)[, 2],
        NA_character_
      )
    ) %>%
    filter(PathwayID %in% sig_pathways$PathwayID) %>%
    filter(!is.na(GeneFamily)) %>%
    select(PathwayID, GeneFamily)
  
  write_csv(pathway_gene_pairs, file.path(results_folder, "significant_pathway_gene_pairs.csv"))
  
  read_mapping <- function(path) {
    gf <- unique(pathway_gene_pairs$GeneFamily)
    read_tsv(path, col_names = FALSE, col_types = cols(.default = "c")) %>%
      pivot_longer(cols = -X1, names_to = NULL, values_to = "GeneFamily") %>%
      filter(GeneFamily %in% gf) %>%
      select(FunctionID = X1, GeneFamily) %>%
      filter(!is.na(GeneFamily))
  }
  for (name in mapping_names) {
    map_file <- file.path(map_dir, paste0(mapping_prefix, tolower(name), uniref_suffix))
    if (!file.exists(map_file)) {
      warning("Mapping file missing: ", map_file)
      next
    }
    map_long <- read_mapping(map_file)
    joined <- pathway_gene_pairs %>%
      inner_join(map_long, by = "GeneFamily")
    out_file <- file.path(results_folder, sprintf("significant_pathways_genefamily_%s_links2.csv", tolower(name)))
    write_csv(joined, out_file)
    cat(sprintf("✔ %s: %d UniRef90 gene family ID–%s links saved to %s\n",
                name, nrow(joined), name, basename(out_file)))
    rm(map_long, joined)
    gc()
  }
  cat("Summary:\n")
  cat("  • Significant pathways:", nrow(sig_pathways), "\n")
  cat("  • Pathway–UniRef90 gene family ID pairs:", nrow(pathway_gene_pairs), "\n")
}
# Example usage
link_pathways_to_genefamilies(
  base_dir = "$your_results_folder"
)
Hope this helps — and happy science to everyone!
– Chiara