The bioBakery help forum

Why a particular file aborts running in the middle of the HUMAnN analysis?

Hi @franzosa @lauren.j.mciver
I am running HUMAnN 3.0. But for some files it stops running in the middle of analysis. I am confused why it happens. Can you please help? I am attaching the log file here.
Thanks.
biobakery.txt (57.0 KB)

Hi, Thank you for the detailed post and including the log! It looks like the run was killed likely due to memory use during the diamond alignment step (translated search portion of the workflow). You could try running HUMAnN and bypass the translated search step or use one of the smaller filtered databases (if you are using the full database).

Thank you,
Lauren

1 Like

Thanks @lauren.j.mciver … I am using the uniref90_ec_filtered_diamond database. Which one should I use?

Thanks,
DC7

Hi - That is the smaller filtered database I was thinking of (sorry for not providing the specific name in my prior post). Is it possible something else might be running on the machine at the same time? It looks like you have 32Gb of memory so I would think that should be enough to run translated search with that database.

Thank you,
Lauren

1 Like

Hi @lauren.j.mciver - I am running in Google cloud for HUMAnN only, nothing else. I have 30-40 files that have been set to run with a loop. I am also bypassing the MetaPhlAn step providing the metaphlan output profile file.

Thanks

Hi - Thanks for the info! Is it possible to increase the amount of memory available to your google cloud instance? If possible, try adding just 8Gb more memory and see if that resolves the issue.

Thank you,
Lauren

1 Like

Okay… I will definitely try that. But, do you think only 8 more GB can solve the issue?

Hi - I am not sure exactly how much memory you will need. My guess is you should only need a bit more memory. If you have a lot of flexibility with your cloud compute instance type definitely try doubling the memory and then track the memory usage of the process for a single run. To track memory usage you can use the HUMAnN tool humann_benchmark (usage: $ humann_benchmark COMMAND). The memory doubled should be more then enough and once you know about how much memory you need based on the size of your input files, the amount of reads passing to translated search, and the filtered database you are using, you can reduce your instance size for the remainder of your runs.

Thanks,
Lauren

1 Like

Hi @lauren.j.mciver -

Here’s a screenshot when tracked memory usage with humann_benchmark. It seems the highest usage is 28.9 GB. It seems memory increase will not work, Right?

EDIT 1: I also have tried in an instance in AWS (16 cores, 32 GB RAM). Again, the result is a killed process in the middle of the job. It shows a maximum of 29.6 GB of RAM usage.

Thanks
DC7

Hi DC7, Thank you for the detailed follow up! Can you try an instance with more memory (8 or 16Gb more)? I think there is likely ~2Gb of memory used for other tasks running in the background (essential tasks like your ssh session) and when combined with the ~30Gb for the HUMAnN task all the memory available is in use and so the HUMAnN task is killed. If you can get a bit more memory for your instance I think it should resolve the error you are seeing.

Thank you,
Lauren

1 Like

Okay @lauren.j.mciver thanks for your respinse. I am trying to launch an instance with 64GB RAM. I will let you know after running HUMAnN here.
One additional query: Should I always run files with same sequence depth in HUMAnN or can I run samples (fastq files) of varying depth in a study?

DC7

Hi DC7, Sounds good! You should be okay to run samples (fastq files) with varying depth.

Thank you,
Lauren

1 Like

Hi @lauren.j.mciver - Actually, I asked this question not in connection with this memory usage problem. I asked this in scientific point of view. I mean if I run varying depth, may it influence the result in following statistical analyses?

Thanks
DC7

Hi DC7, The gene family abundances are computed as RPK (reads per kilobase) normalized for gene length. This file includes the “UNMAPPED” read counts too so the total number of reads per sample is captured in the file. If you sum-normalize the RPK values this will adjust for differences in sequencing depth across samples.

Thanks,
Lauren

Hi @lauren.j.mciver I am running 16 cores , 42 GB RAM virtual machine and now every sample is running except a single one which stopped at the middle. Here’s it: MH0006.txt (39.7 KB) . Can you please find out the exact reason?

Thanks,
DC7

Hi DC7, I am glad to hear that most of the samples ran okay with more memory! Looking at the log it looks like everything is okay and then it runs out of memory. Sorry I can’t tell exactly why this sample runs out of memory and the others do not. Is it possible this input file is larger then the other files? Another possible reason is that maybe this sample has a higher alignment rate? If you could try running this input file with more memory I think that should resolve the issue.

Thank you,
Lauren

1 Like

Yes @lauren.j.mciver it is running perfectly with more memory. Thanks for your constant support.

DC7