Question on Running HUMAnN4 with Limited Memory on HPC

Hi HUMAnN Developers,

I’m trying to run HUMAnN4 on my metagenomic reads following the tutorial provided here:

I strictly followed the instructions for downloading the database and installing HUMAnN4.

However, when I executed the following code using 4 threads with 6GB memory per thread, it failed with the following error:

Loading database information...Failed attempt to allocate 308038999768bytes;
you may not have enough free memory to load this database.
If your computer has enough RAM, perhaps reducing memory usage from
other programs could help you load this database?
classify: unable to allocate hash table memory

Code I used based on suggestion here https://forum.biobakery.org/t/humann-4-cant-work-with-metaphlan-4/8201/2:

> conda activate biobakery4 
> metaphlan sample.fq.gz \
>     --input_type fastq \
>     -x mpa_vOct22_CHOCOPhlAnSGB_202403 \
>     -t rel_ab_w_read_stats \
>     -o sample_rel_ab_w_read_stats.tsv
> humann -i sample.fq.gz \
>     --threads 4 \
>     --taxonomic-profile sample_rel_ab_w_read_stats.tsv \
>     --metaphlan-options "--input_type fastq -x mpa_vOct22_CHOCOPhlAnSGB_202403 -t rel_ab_w_read_stats" \
>     -o sample

I’m running this on a shared HPC environment but unfortunately the memory and threads are very limited. I would appreciate your advice on the following:

  1. Is there any way to reduce memory usage?
  2. Do you recommend any alternative approaches when working in memory-constrained environments?

Thank you very much for your help!
Ivy

That’s a tough one - even the MetaPhlAn marker database now occupies a pretty big memory footprint as our understanding of the microbial universe has expanded. For what it’s worth, a recent benchmark of MetaPhlAn + HUMAnN 4 required 25 GB of RAM (MaxRSS), so it’s possible your 4x6 was JUST shy of enough?

Thank you for the quick suggestion — it’s now working with 8 threads and 6 GB each.

I have one further question which is related to database: I tried using the Struo2-released HUMAnN3 database from GTDB release 207, which is around 180 GB for UniRef90. However, I also downloaded the database from the official HUMAnN3 tutorial, and that one appears to be smaller in size.

As someone fairly new to this, would you recommend using the official HUMAnN3 database or the GTDB r207 version from Struo2? Will the results differ significantly?

If my taxonomic profiling was done using GTDB 207, should I also use the GTDB 207 database for HUMAnN to ensure consistency? Conversely, if I’m using MetaPhlAn for taxonomic assignment, should I stick with the database recommended in the official HUMAnN tutorial?

From what I’ve seen of Struo(2) it seems very reasonable and useful, but I don’t have any hands-on experience with it from which to offer an informed comparison. That also means that we’ll be very limited in what sort of tech support we can provide here if problems arise using non-bioBakery databases.