Humann3 hanging during run, no error messages but no changes to output, log or temp files

Hi I am trying to run Humann3 on some metagenomic sequences. I have the software installed and running on a university cluster with the jobs being managed by a SLURM system.

The problem I am having is that the software starts running fine without any errors and is making temp, output and log files but after a day or two the output files just stop being added to. The software has been running for 14 days now with no change but there is no error messages and the software hasn’t obviously crashed or stopped. The script I am running is below:

#!/bin/bash
#SBATCH --ntasks=50 # threads
#SBATCH --nodes=1 # Use x nodes (physical machines)
#SBATCH --job-name=hum2 # sensible name for the job
#SBATCH --mem=120G # Default memory per CPU is 3GB.
#SBATCH --partition=hugemem # Use the verysmallmem-partition for jobs requiring < 10 GB RAM.
#SBATCH --mail-user=silas.vick@nmbu.no # Email me when job is done.
#SBATCH --mail-type=ALL

If you would like to use more please adjust this.

Below you can put your scripts

If you want to load module

module load anaconda3/latest # Load Python 3 with many useful libraries.

module list # List loaded modules

module load Miniconda3
source activate biobakery3

humann --input /net/fs-1/projects01/SoilCyc/SoilCycReads/catfiles/2_combined.fq.gz --threads 50 --output /net/cn-1/mnt/SCRATCH/sivick/humann3_output/

I hope this is informative but let me know if any more details are needed.

Regards,
Silas

Hi Silas, Thank you for the detailed post. I am not sure what might be going on. What stage is the run currently in? If you check the log does it look like it is running diamond? If so, do you see the diamond process currently running?

Thank you,
Lauren

Hi Lauren,

I believe it is running metaphlan. Here is the last few lines of the log file:

06/02/2021 04:23:41 PM - humann.store - DEBUG: Initialize Alignments class instance to minimize memory use
06/02/2021 04:23:41 PM - humann.store - DEBUG: Initialize Reads class instance to minimize memory use
06/02/2021 04:24:02 PM - humann.humann - INFO: Load pathways database part 1: /mnt/users/sivick/.conda/envs/biobakery3/lib/python3.7/site-packages/humann/data/pathways/metacyc_reactions_level4ec_only.uniref.bz2
06/02/2021 04:24:02 PM - humann.humann - INFO: Load pathways database part 2: /mnt/users/sivick/.conda/envs/biobakery3/lib/python3.7/site-packages/humann/data/pathways/metacyc_pathways_structured_filtered
06/02/2021 04:24:02 PM - humann.search.prescreen - INFO: Running metaphlan …
06/02/2021 04:24:02 PM - humann.utilities - DEBUG: Using software: /mnt/users/sivick/.conda/envs/biobakery3/bin/metaphlan
06/02/2021 04:24:02 PM - humann.utilities - INFO: Execute command: /mnt/users/sivick/.conda/envs/biobakery3/bin/metaphlan /net/cn-1/mnt/SCRATCH/sivick/humann3_output/1_combined_humann_temp/tmp9gccad2x/tmpcv92_w_m -t rel_ab -o /net/cn-1/mnt/SCRATCH/sivick/humann3_output/1_combined_humann_temp/1_combined_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /net/cn-1/mnt/SCRATCH/sivick/humann3_output/1_combined_humann_temp/1_combined_metaphlan_bowtie2.txt --nproc 50

Hi - Thank you for checking. Would you try running just the MetaPhlAn command directly (see command below)? This might give us more information as to what might be going on.

$ /mnt/users/sivick/.conda/envs/biobakery3/bin/metaphlan /net/cn-1/mnt/SCRATCH/sivick/humann3_output/1_combined_humann_temp/tmp9gccad2x/tmpcv92_w_m -t rel_ab -o /net/cn-1/mnt/SCRATCH/sivick/humann3_output/1_combined_humann_temp/1_combined_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /net/cn-1/mnt/SCRATCH/sivick/humann3_output/1_combined_humann_temp/1_combined_metaphlan_bowtie2.txt --nproc 50

Thank you,
Lauren

Hi Lauren,

Running this script I get the following error:

BowTie2 output file detected: /net/cn-1/mnt/SCRATCH/sivick/humann3_output/1_combined_humann_temp/1_combined_metaphlan_bowtie2.txt
Please use it as input or remove it if you want to re-perform the BowTie2 run.
Exiting…

SIlas

Hi Silas, Thank you for checking. Can you try removing the file and re-running to see if there are any additional error messages?

Thank you,
Lauren

Hi Lauren, I have removed the file and rerun the script. I am seeing the same thing as initially where the 1_combined_metaphlan_bowtie2.txt file is generated up to a certain point (6129 KB file size) then nothing more happens but the slurm job doesn’t finish or throw up an error.

The final lines in the log file are:

06/02/2021 04:23:41 PM - humann.store - DEBUG: Initialize Alignments class instance to minimize memory use
06/02/2021 04:23:41 PM - humann.store - DEBUG: Initialize Reads class instance to minimize memory use
06/02/2021 04:24:02 PM - humann.humann - INFO: Load pathways database part 1: /mnt/users/sivick/.conda/envs/biobakery3/lib/python3.7/site-packages/humann/data/pathways/metacyc_reactions_level4ec_only.uniref.bz2
06/02/2021 04:24:02 PM - humann.humann - INFO: Load pathways database part 2: /mnt/users/sivick/.conda/envs/biobakery3/lib/python3.7/site-packages/humann/data/pathways/metacyc_pathways_structured_filtered
06/02/2021 04:24:02 PM - humann.search.prescreen - INFO: Running metaphlan …
06/02/2021 04:24:02 PM - humann.utilities - DEBUG: Using software: /mnt/users/sivick/.conda/envs/biobakery3/bin/metaphlan
06/02/2021 04:24:02 PM - humann.utilities - INFO: Execute command: /mnt/users/sivick/.conda/envs/biobakery3/bin/metaphlan /net/cn-1/mnt/SCRATCH/sivick/humann3_output/1_combined_humann_temp/tmp9gccad2x/tmpcv92_w_m -t rel_ab -o /net/cn-1/mnt/SCRATCH/sivick/humann3_output/1_combined_humann_temp/1_combined_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /net/cn-1/mnt/SCRATCH/sivick/humann3_output/1_combined_humann_temp/1_combined_metaphlan_bowtie2.txt --nproc 50

Regards,
Silas

Hi Silas, Thank you for the follow up. Can you try removing the file 1_combined_metaphlan_bowtie2.txt and then run the following command directly on the command line? I am hoping MetaPhlAn will print out a warning or error to help us debug what might be up.

$ /mnt/users/sivick/.conda/envs/biobakery3/bin/metaphlan /net/cn-1/mnt/SCRATCH/sivick/humann3_output/1_combined_humann_temp/tmp9gccad2x/tmpcv92_w_m -t rel_ab -o /net/cn-1/mnt/SCRATCH/sivick/humann3_output/1_combined_humann_temp/1_combined_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /net/cn-1/mnt/SCRATCH/sivick/humann3_output/1_combined_humann_temp/1_combined_metaphlan_bowtie2.txt --nproc 8

Thank you,
Lauren

Hi Lauren,

Unfortunately this is on a shared university server and we aren’t permitted to run jobs outside of the SLURM system. I tried to do this interactively using the qlogin command but this timed out before the process had finished. It did however start to produce a 1_combined_metaphlan_bugs_list.tsv file though. That file looks like this:

#mpa_v30_CHOCOPhlAn_201901
#/mnt/users/sivick/.conda/envs/biobakery3/bin/metaphlan /net/cn-1/mnt/SCRATCH/sivick/humann3_output/1_combined_humann_temp/tmp9gccad2x/tmpcv92_w_m -t rel_ab -o /net/cn-1/mnt/SCRATCH/sivick/humann3_output/1_combined_humann_temp/1_combined_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /net/cn-1/mnt/SCRATCH/sivick/humann3_output/1_combined_humann_temp/1_combined_metaphlan_bowtie2.txt --nproc 8
#SampleID Metaphlan_Analysis
#clade_name NCBI_tax_id relative_abundance additional_species
k__Bacteria 2 97.5294

And continues on without any error messages. I am not entirely sure how to continue with this.

Silas

Hi Silas, Thank you for the follow up. Would you try allowing it to run for at most 8 hours and then check for any stdout or stderr messages? I don’t think it should take longer then 8 hours to run unless there is some issue with resources for the compute node you are on.

Thank you,
Lauren

I think it would have run for 8 hours actually. I didn’t get any stdout or stderr messages though. I do think that it is after 1 or 2 days of running that it starts hanging when I run it normally though.

Silas

Hi Silas, Thank you for the follow up. How large (in Gbs) is your input file? I am not sure what might be going on so I am going to ping one of the MetaPhlAn developers to get their input.

Hi @fbeghini , Apologies for the ping. Do you know what might possibly be going on with this run hanging? Thank you in advance for your thoughts/suggestions.

Thank you,
Lauren

Hi Lauren,

They are about 11 Gb gzipped and I would say maybe 60 Gb unzipped.

Silas

Hi Silas, Wow! That is a lot of reads. That is really neat. I don’t know how much time or memory that might take to process through MetaPhlAn. Have you let it run for a couple days with a bit of memory and then see if it finishes?

Thanks,
Lauren

So I have let it run for 14 or more days with 120 Gb of RAM allocated (maybe it needs a lot more?). I have 19 samples of roughly the same size though so computing time is definitely going to be an issue for me. Would it be a lot faster to just run the functional gene mapping? I’m more interested in quantifying reads to KEGG annotations than the taxonomic side of things.

Silas

Hi @SVick and @lauren.j.mciver ,
If the metaphlan_bug_list.txt file is produced, MetaPhlAn execution should have finished, you can check if the last two lines of the 1_combined_metaphlan_bowtie2 file are starting with a #.

Hi fbehini,

Unfortunately the MetPhlAn doesn’t complete and there is no metaphlan_bug_list.txt file generated. The 1_combined_metaphlan_bowtie2 keeps getting larger for the first day or so and then stops growing at a particular point (always gets to the same file size). After this nothing happens, I don’t get any errors and it still appears to be running but nothing more happens.

Silas

Can you try aligning your metagenome to the MetaPhlAn database with bowtie2 using the parameters --no-unal --very-sensitive -k 2? Maybe it gets stuck finding the best alignment, with -k 2 we can force the search for up to 2 alignments.

Unfortunately I have started up on some new work and this is going to have to be put on the backburner for a while. Thanks for your help so far.

Silas

Hi I am running into the same issue. Our files are huge, and we have 24. We allocated 15 Tb on a server and are running with 40 cores (–threads N). So we just need an OTU table (combined 24 samples) with relative amounts and identifications. Is it possible to create this without the “genefamilies.tsv” and all other files (which humann3 is not getting to because it stops, I have attached a log of an example). Can we work with any of the temporary files to create an OTU table? Thanks for your help!

Processing: B2A-PE-B10I7-B10I5-1_S10_R1_001.log…
B2A-PE-B10I7-B10I5-1_S10_R1_001.txt (13.4 KB)