Humann program running was killed in the middle way

Previously, I had problem with the bowtie2 databse in metaphlan and it showed -index option. After rebuild btindex files, the problem resolved. I updated the metaphlan 3.0.2. After running the
humann -i demo.fastq -o demo_result
the final results were ok because all three tsv. files were generated as specified. However, when ran the commend for the own data sample, the terminal showed the “killed”:

Total species selected from prescreen: 101
Selected species explain 99.94% of predicted community composition
Creating custom ChocoPhlAn database …
Running bowtie2-build …
Running bowtie2 …
Killed

In the running log, i find the following record:

07/31/2020 02:44:18 PM - humann.humann - INFO: TIMESTAMP: Completed nucleotide alignment : 1879 seconds
07/31/2020 03:10:05 PM - humann.utilities - DEBUG: Total alignments where percent identity is not a number: 0
07/31/2020 03:10:05 PM - humann.utilities - DEBUG: Total alignments where alignment length is not a number: 0
07/31/2020 03:10:05 PM - humann.utilities - DEBUG: Total alignments where E-value is not a number: 0
07/31/2020 03:10:05 PM - humann.utilities - DEBUG: Total alignments not included based on large e-value: 0
07/31/2020 03:10:05 PM - humann.utilities - DEBUG: Total alignments not included based on small percent identity: 0
07/31/2020 03:10:05 PM - humann.utilities - DEBUG: Total alignments not included based on small query coverage: 0
07/31/2020 03:10:05 PM - humann.search.blastx_coverage - INFO: Total alignments without coverage information: 0
07/31/2020 03:10:05 PM - humann.search.blastx_coverage - INFO: Total proteins in blastx output: 267838
07/31/2020 03:10:05 PM - humann.search.blastx_coverage - INFO: Total proteins without lengths: 0
07/31/2020 03:10:05 PM - humann.search.blastx_coverage - INFO: Proteins with coverage greater than threshold (50.0): 162359
So i think the programm had finished the metaphlan to identify the microbes taxonomy but failed in the functional protein steps. Couldy ou help me to shoot the problems?

By the way, my computer has total 32 Gb RAM and 650 Gb space, so i guess it is not due to space or RAM limit.

Hello, Thank you for the detailed post! It seems like you would have enough memory and disk space. Are you possibly running a few HUMAnN tasks at the same time or is maybe something else running on the same machine? If so would you try running one task and see if that solves it? Sorry in advance if you have already tried this! The fact that the run is killed seems like it is likely memory related but at the same time it does seem like you have plenty of compute.

Thank you,
Lauren

Hi, Lauren:

Thank you for your reply. At the first time, i did run two tasks on conda and both of program would use the bowtie2 in different environments i created. After few hours, the sign of “killed” information showed. After that, I tried to run one terminal for just one sample (original fastq file 32 Gb) but the killed information repeated. However, when i ran the demo.fastq file and demo.sam file, expected three tsv. files could be generated. So i feel confused about the humann program. Current metaphlan version I installed is 3.0.2 and humann version is 3.alpha. I will try again and update later.

HI, Laurance:

I reinstall the biobakery3 environment and update the metapblan into 3.0.2. Following your suggestion, this time i just open one terminal for Humann program for one sample only. However, the same outcomes came out from terminal and sample log.

I also attached initial setting:
08/05/2020 12:55:11 PM - humann.humann - INFO: Running humann v3.0.0.alpha.3
08/05/2020 12:55:11 PM - humann.humann - INFO: Output files will be written to: /home/chaozhi/anaconda3/envs/biobakery3/working_file/BSF1_meta
08/05/2020 12:55:11 PM - humann.humann - INFO: Writing temp files to directory: /home/chaozhi/anaconda3/envs/biobakery3/working_file/BSF1_meta/BSF1_nohost_humann_temp
08/05/2020 12:55:11 PM - humann.utilities - INFO: File ( /home/chaozhi/anaconda3/envs/biobakery3/working_file/BSF1_nohost.fastq ) is of format: fastq
08/05/2020 12:55:11 PM - humann.utilities - DEBUG: Check software, metaphlan, for required version, 3.0
08/05/2020 12:55:16 PM - humann.utilities - INFO: Using metaphlan version 3.0
08/05/2020 12:55:16 PM - humann.utilities - DEBUG: Check software, bowtie2, for required version, 2.2
08/05/2020 12:55:17 PM - humann.utilities - WARNING: Can not call software version for bowtie2
08/05/2020 12:55:17 PM - humann.utilities - INFO: Using bowtie2 version UNK
08/05/2020 12:55:17 PM - humann.humann - INFO: Search mode set to uniref90 because a uniref90 translated search database is selected
08/05/2020 12:55:17 PM - humann.utilities - DEBUG: Check software, diamond, for required version, 0.9.24
08/05/2020 12:55:17 PM - humann.utilities - INFO: Using diamond version 2.0.1
08/05/2020 12:55:17 PM - humann.config - INFO:
Run config settings:

DATABASE SETTINGS
nucleotide database folder = /home/chaozhi/anaconda3/envs/biobakery3/chocophlan
protein database folder = /home/chaozhi/anaconda3/envs/biobakery3/uniref
pathways database file 1 = /home/chaozhi/anaconda3/envs/biobakery3/lib/python3.7/site-packages/humann/data/pathways/metacyc_reactions_level4ec_only.uniref.bz2
pathways database file 2 = /home/chaozhi/anaconda3/envs/biobakery3/lib/python3.7/site-packages/humann/data/pathways/metacyc_pathways_structured_filtered
utility mapping database folder = /home/chaozhi/anaconda3/envs/biobakery3/utility_mapping

RUN MODES
resume = False
verbose = False
bypass prescreen = False
bypass nucleotide index = False
bypass nucleotide search = False
bypass translated search = False
translated search = diamond
pick frames = off
threads = 8

SEARCH MODE
search mode = uniref90
nucleotide identity threshold = 0.0
translated identity threshold = 80.0

ALIGNMENT SETTINGS
bowtie2 options = --very-sensitive
diamond options = --top 1 --outfmt 6
evalue threshold = 1.0
prescreen threshold = 0.01
translated subject coverage threshold = 50.0
translated query coverage threshold = 90.0
nucleotide subject coverage threshold = 50.0
nucleotide query coverage threshold = 90.0

PATHWAYS SETTINGS
minpath = on
xipe = off
gap fill = on

INPUT AND OUTPUT FORMATS
input file format = fastq
output file format = tsv
output max decimals = 10
remove stratified output = False
remove column description output = False
log level = DEBUG

After finish taxonomic part, from terminal:
Total species selected from prescreen: 101
Selected species explain 99.94% of predicted community composition
Creating custom ChocoPhlAn database …
Running bowtie2-build …
Running bowtie2 …
Killed

From sample log:
08/05/2020 02:31:45 PM - humann.humann - INFO: TIMESTAMP: Completed nucleotide alignment : 2252 seconds
08/05/2020 02:57:34 PM - humann.utilities - DEBUG: Total alignments where percent identity is not a number: 0
08/05/2020 02:57:34 PM - humann.utilities - DEBUG: Total alignments where alignment length is not a number: 0
08/05/2020 02:57:34 PM - humann.utilities - DEBUG: Total alignments where E-value is not a number: 0
08/05/2020 02:57:34 PM - humann.utilities - DEBUG: Total alignments not included based on large e-value: 0
08/05/2020 02:57:34 PM - humann.utilities - DEBUG: Total alignments not included based on small percent identity: 0
08/05/2020 02:57:34 PM - humann.utilities - DEBUG: Total alignments not included based on small query coverage: 0
08/05/2020 02:57:34 PM - humann.search.blastx_coverage - INFO: Total alignments without coverage information: 0
08/05/2020 02:57:34 PM - humann.search.blastx_coverage - INFO: Total proteins in blastx output: 275978
08/05/2020 02:57:34 PM - humann.search.blastx_coverage - INFO: Total proteins without lengths: 0
08/05/2020 02:57:34 PM - humann.search.blastx_coverage - INFO: Proteins with coverage greater than threshold (50.0): 167876

Since my linux run in the virtual machine so one possible reason the high usage of memory by virtual machine. Besides, i can not figure out the potential reason. I am looking forward to your reply.

Best Regards,

                                       Chaozhi Pan

Hi Chaozhi Pan, Thank you for trying again and for the detailed follow up message. I agree with you that the memory error might be because you are running in a virtual machine. We estimate approximately 24 to 32 Gb of memory for a HUMAnN run based on the size of the input file. If your input file is 32 Gb that is a fantastic number of reads! However, I would expect a file of that size to need memory on the higher range of the estimate. Would it be possible to install and run HUMAnN outside of the VM? If so that might resolve the memory issue.

Thank you,
Lauren

Hi, Laruren:

I just tried again this morning by increasing the available RAM in virtual machine. The program was killed again in the middle way but it run a bit far away to the step of running diamond. Likely, my issue is the limited memory. I will find way to resolve the problem.

Thank you for your intermediate and patient reply. I really enjoy the discussion in this forum with passion from you guys.

Best