HUMAnN 4 database downloads inaccessible due to Globus (data.globus.org) domain blocking

Hello, fellow bioBakers! :waving_hand:

I’m encountering an infrastructure issue that prevents me from downloading HUMAnN 4 databases from institutional networks with certain security policies.

Problem Description

When attempting to download HUMAnN 4 databases using:

humann_databases --download utility_mapping full data/databases/humann/
humann_databases --download chocophlan full data/databases/humann/
humann_databases --download uniref uniref90_ec_filtered_diamond data/databases/humann/

All three downloads fail with the same error pattern:

CRITICAL ERROR: Unable to download and extract from URL: <database_url>

Root Cause

The download URL redirects to a subdomain of data.globus.org, and this domain is blocked by our institutional network security policies. Specifically, our cybersecurity team has classified Globus as a peer-to-peer application and has blocked access to globus.org domains as part of corporate network security measures established a few years ago.

Impact

I have confirmed that all three HUMAnN 4 database downloads fail due to this issue:

  • utility_mapping database - FAILED

  • chocophlan database - FAILED

  • uniref database - FAILED

Workaround Attempted

I was able to download the database manually from an external server, but this is not a sustainable solution for:

  • Other users in our institution

  • Automated workflows

  • Future database updates

Feature Request / Question

Could the bioBakery team consider providing:

  1. Alternative download mirrors that don’t rely on Globus infrastructure (e.g., direct HTTP/HTTPS from huttenhower.sph.harvard.edu or other academic mirrors)?

  2. Documentation on how to manually download and install databases when automated downloads are blocked?

  3. Support for --database-location with local file paths in humann_databases (if not already fully supported)?

This issue affects users in institutions with strict network security policies that block peer-to-peer or file-sharing platforms. Providing alternative download methods would greatly improve accessibility for these users.

Environment Details

  • HUMAnN version: v4.0.0.alpha.1

  • Network environment: Institutional HPC cluster with corporate network security policies

  • Blocked domain: *.globus.org (specifically data.globus.org)

  • Affected download URLs (all redirect to Globus):

    • http://huttenhower.sph.harvard.edu/humann_data/full_mapping_v4_alpha.tar.gz
    • http://huttenhower.sph.harvard.edu/humann_data/chocophlan/chocophlan.v4_alpha.tar.gz
    • http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_ec_filtered/uniref90_annotated_v4_alpha_ec_filtered.tar.gz

Any guidance or alternative solutions would be greatly appreciated!

Best regards,
Fran :man_technologist:

Thank you for the detailed report. We can look into download alternatives. We have gradually shifted to focusing on Globus because it proved to be more efficient and stable than our previous offerings, but if it’s regularly systematically blocked, that is something we need to consider.

I didn’t follow your point re: --database-location? If you’ve manually downloaded a database, you can point a HUMAnN installation at it by running the humann_config utility to update the relevant config setting with the new/correct path. For example:

humann_config --update database_folders nucleotide /path/to/chocophlan

You can also specify these paths at runtime if you can’t update the config for some reason.

Hi, @franzosa ! :waving_hand:

Thank your for your reply :slight_smile:

I appreciate your willingness to consider download alternatives. I think it would really help us here. It is very unfortunate that our cybersecurity team blocked access to globus.org domains :frowning:

Regarding my original point about the --database-location: thank you for mentioning the humann_config utility, I didn’t know about it!

Best,
Fran :man_technologist: