Maaslin3 Memory Consumption during Logistic/Prevalence Abundance

ktmbiome-niaid · February 20, 2025, 5:04pm

Hello!

I really enjoy working with Maaslin3! I’m currently running it as an R package on around 1000 samples that are longitudinal in nature (ie, have a mixed effect component). I am finding during the logistic/prevalence testing phase that I run out of memory that I cannot recover from.

I am running on HPC and even assigned 100G of memory to my Maaslin3 job and it was killed for running out of memory. If I use a smaller dataset that fills just under my assigned memory limits, I cannot clear out this used memory. I need to fully restart R to be able to do anything memory intensive in the future.

Do you happen to know what might be causing this?

Thank you so much!

WillNickols · February 20, 2025, 6:03pm

Hi!

Can you send the maaslin3 command you’re running? Have you turned on save_models, increased the max_pngs substantially, or turned on parallelization?

Will

ktmbiome-niaid · February 21, 2025, 5:13pm

Hi Will!

Here is my current command:

fit_out <- maaslin3(input_data=data.frame(otu_table(phy)),
                    input_metadata=data.frame(sample_data(phy)),
                    formula = "~ myvar",
                    output="maaslin3",
                    normalization="NONE",
                    transform="LOG",
                    standardize=FALSE,
                    max_significance=0.05,
                    min_abundance=100,
                    min_prevalence=0.1,
                    max_pngs=1,
                    warn_prevalence=FALSE,
                    cores=future::availableCores())

My input_data has about 6000 features, and 900 samples; using my filters here, about 5000 are filtered out. My input_metadata is ~1000 variables that I don’t need and could cut down. I definitely wanted to ensure that I wasn’t saving models or making pngs. The model should also be mixed-effects, but I thought maybe the mixed effects models were causing issues (in the pre-Maaslin days, I had a lot of related trouble with glmmTMB), so I pulled that part out too…

ktmbiome-niaid · February 21, 2025, 5:22pm

Update: When I limit my sample metadata to only the variables I want (ie, data.frame(sample_data(phy)[,c("myvar","myvar2")]), the total memory usage is much better. I only use about 3GB during the logistic phase, instead of multiple GB, and my code that I shared above works.

I still have the 3GB of memory used after successful completion of Maaslin3, and another run of Maaslin3 will use an additional 3GB on top of that, so I am limited in the total number of Maaslin3’s I can run in a specific R session without restarting R…

WillNickols · February 21, 2025, 5:51pm

If you run with 1 core rather than future::availableCores(), do you have the same memory leakage and maximum memory issues? My guess is that the pbapply package is copying your entire working memory for each parallel core. In my own runs, I’ve found that using multiple cores often doesn’t speed things up that much and seriously inflates the memory used. I’ve never found parallelization in R to be great, and it’s possible there’s a bug in the pbapply package.

ktmbiome-niaid · February 21, 2025, 6:12pm

That seems to about halved the total memory, so I’m down to 1.5GB per Maaslin3 run, which is definitely improved from 100GB!

WillNickols · February 21, 2025, 7:58pm

Sounds good - I’ll consider this resolved unless you think the runtime is now going to be too long.

Topic		Replies	Views
Spike-in mode of MaAsLin3 MaAsLin	1	131	May 29, 2024
MaAsLin3 with estimated microbial load MaAsLin	8	101	December 24, 2024
MaAsLin3 errors MaAsLin	9	149	March 4, 2025
Output data reduction MaAsLin	1	312	August 26, 2022
Maaslin3 stuck at filtering MaAsLin	9	69	March 24, 2025

Maaslin3 Memory Consumption during Logistic/Prevalence Abundance

Related topics