The bioBakery help forum

Error message returned from diamond

Hello,

when trying to run HUMAnN3 I retrieve an error message returned from diamond:

Error message returned from diamond :
diamond v0.9.24.125 | by Benjamin Buchfink buchfink@gmail.com
Licensed under the GNU GPL https://www.gnu.org/licenses/gpl.txt
Check http://github.com/bbuchfink/diamond for updates.

#CPU threads: 4
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: Text file busy
Error: Error calling unlink.

06/29/2020 03:37:26 PM - humann.utilities - CRITICAL: TRACEBACK:
Traceback (most recent call last):
File “/home/plicht/anaconda3/envs/metaphlan/lib/python3.7/site-packages/humann/utilities.py”, line 744, in execute_command
p_out = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
File “/home/plicht/anaconda3/envs/metaphlan/lib/python3.7/subprocess.py”, line 411, in check_output
**kwargs).stdout
File “/home/plicht/anaconda3/envs/metaphlan/lib/python3.7/subprocess.py”, line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command ‘[’/home/plicht/anaconda3/envs/metaphlan/bin/diamond’, ‘blastx’, ‘–query’, ‘/media/sf_projects/microbiome/Analysis_of_microbiome/WiP/KneadData/firsttry/HUMAnN3/output/TRIAL_PL018_1_Novogenea1_1/TRIAL_PL018_1_Novogenea1_1_humann_temp/TRIAL_PL018_1_Novogenea1_1_bowtie2_unaligned.fa’, ‘–evalue’, ‘1.0’, ‘–threads’, ‘4’, ‘–top’, ‘1’, ‘–outfmt’, ‘6’, ‘–db’, ‘/media/sf_projects/microbiome/Analysis_of_microbiome/BioBakery-Tools/Databases/HUMAnN_db/uniref/uniref90_201901’, ‘–out’, ‘/media/sf_projects/microbiome/Analysis_of_microbiome/WiP/KneadData/firsttry/HUMAnN3/output/TRIAL_PL018_1_Novogenea1_1/TRIAL_PL018_1_Novogenea1_1_humann_temp/tmpln4cfsa3/diamond_m8_18tgcgi4’, ‘–tmpdir’, ‘/media/sf_projects/microbiome/Analysis_of_microbiome/WiP/KneadData/firsttry/HUMAnN3/output/TRIAL_PL018_1_Novogenea1_1/TRIAL_PL018_1_Novogenea1_1_humann_temp/tmpln4cfsa3’]’ returned non-zero exit status 1.

I use HUMAnN3 alpha3 with Diamond v0.9.24.125 installed via conda, and as databases:
-uniref full ($ humann_databases --download uniref uniref90_diamond )

  • ChocoPhlAn full v296_201901 ($ humann_databases –download chocophlan full )
  • Uility mapping full ($ humann_databases --download utility_mapping full )

The functional tests with $ humann_test --run-functional-tests-tools --run-functional-tests-end-to-end are fine. Also when running the demo.fastq with the demo databases HUMAnN3 is working properly. Can you help me out?

Sorry for the delay here. Everything about the installation seems fine. This might be an issue with the I/O in your computer environment (per the “file busy” error). You could try re-running, or using another location to store the outputs?

Hi Eric,

thanks for your suggestions. Indeed, I guess the I/O directory used could be a problem since I am currently running a virtual ubuntu machine with outrunning disk space. Thtas why I located the databases as well as I/O files on an external HDD.
However, in the meantime I was able to get the task running with an updated diamond from v0.9.24.125 to v0.9.36.137. The command works well through with full Chocophlan_db and demo protein_db uniref90_demo_prots_v201901. However, when using the full uniref90_db, the command dies with <Signals.SIGKILL: 9>. I googled and watched my RAM and I guess it’s caused by running out of memory. The virtual machine is allocated with 8 GB RAM. Is there a general formula of minumum requirements of HUMAnN3?

In our evaluations on metagenomes with 30M reads, peak RAM usage varied from 16GB to 24GB depending on the balance between nucleotide and translated search. The memory ceiling is typically smaller outside of translated search, though this can also depend on the complexity of the microbial community under study.

Hi @franzosa ,

I have same problem with diamond with error “Signals.SIGKILL: 9”

But it looks like I don’t have a problem with the memory:

Processing query block 1, reference block 6/15, shape 2/2, index chunk 3/4.

Building reference seed array… [8.452s]
Building query seed array… [4.357s]
Computing hash join… [1.761s]
Building seed filter… [0.071s]
Searching alignments… [5.834s]
Processing query block 1, reference block 6/15, shape 2/2, index chunk 4/4.
Building reference seed array… [7.515s]
Building query seed array… [3.126s]
Computing hash join… [1.935s]
Building seed filter… [0.072s]
Searching alignments… [5.277s]
Deallocating buffers… [0.018s]
Clearing query masking… [0.888s]
Opening temporary output file… [0.048s]
Computing alignments…
12/13/2020 12:10:58 AM - humann.utilities - CRITICAL: TRACEBACK:
Traceback (most recent call last):
File “/home/daia1/anaconda3/envs/py37/lib/python3.7/site-packages/humann/utilities.py”, line 756, in execute_command
p_out = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
File “/home/daia1/anaconda3/envs/py37/lib/python3.7/subprocess.py”, line 411, in check_output
**kwargs).stdout
File “/home/daia1/anaconda3/envs/py37/lib/python3.7/subprocess.py”, line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command ‘[’/home/daia1/anaconda3/envs/py37/bin/diamond’, ‘blastx’, ‘–query’, ‘/home/daia1/my_workdir/samples/CART_2050A_humann3_humann_temp_rgp4wrgk/CART_2050A_humann3_bowtie2_unaligned.fa’, ‘–evalue’, ‘1.0’, ‘–threads’, ‘16’, ‘–top’, ‘1’, ‘–outfmt’, ‘6’, ‘–db’, ‘/home/daia1/my_workdir/ref_db/uniref/uniref/uniref/uniref90_201901’, ‘–out’, ‘/home/daia1/my_workdir/samples/CART_2050A_humann3_humann_temp_rgp4wrgk/tmpqzcl2fvw/diamond_m8_pgyuivvi’, ‘–tmpdir’, ‘/home/daia1/my_workdir/samples/CART_2050A_humann3_humann_temp_rgp4wrgk/tmpqzcl2fvw’]’ died with <Signals.SIGKILL: 9>.

12/13/2020 12:10:58 AM - humann.utilities - INFO: Total memory = 503.5974884033203 GB
12/13/2020 12:10:58 AM - humann.utilities - INFO: Available memory = 366.03315353393555 GB
12/13/2020 12:10:58 AM - humann.utilities - INFO: Free memory = 364.29954528808594 GB
12/13/2020 12:10:58 AM - humann.utilities - INFO: Percent memory used = 27.3 %
12/13/2020 12:10:58 AM - humann.utilities - INFO: CPU percent = 46.2 %
12/13/2020 12:10:58 AM - humann.utilities - INFO: Total cores count = 72
12/13/2020 12:10:58 AM - humann.utilities - INFO: Total disk = 159.56462860107422 GB
12/13/2020 12:10:58 AM - humann.utilities - INFO: Used disk = 31.365806579589844 GB
12/13/2020 12:10:58 AM - humann.utilities - INFO: Percent disk used = 19.7 %
12/13/2020 12:10:58 AM - humann.utilities - INFO: Process create time = 2020-12-12 22:16:46
12/13/2020 12:10:58 AM - humann.utilities - INFO: Process user time = 2442.87 seconds
12/13/2020 12:10:58 AM - humann.utilities - INFO: Process system time = 83.03 seconds
12/13/2020 12:10:58 AM - humann.utilities - INFO: Process CPU percent = 0.0 %
12/13/2020 12:10:58 AM - humann.utilities - INFO: Process memory RSS = 14.47745132446289 GB
12/13/2020 12:10:58 AM - humann.utilities - INFO: Process memory VMS = 14.610565185546875 GB
12/13/2020 12:10:58 AM - humann.utilities - INFO: Process memory percent = 2.8748061016674917 %

I have a total of 67 samples, 54 successfully finished with the same code, but the remaining 13 just stalled and never finished. I have enough disk space also.
Do you have any idea what could be causing the problem?

Anqi

That error suggests that the system told the process to stop. It could be something like running out of time (in a cluster environment) or the system restarting?

I can reproduce this error with 10% of my files.
I tried to rerun the failing ones with same settings (in case the filesystem just had a bad day) but that did not change the outcome.
For me the pipeline dies at the step Diamond search.

CRITICAL ERROR: Error executing: anaconda3/envs/biobakery3/bin/diamond blastx --query [...]

Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: [...]/tmp5bljfj5d
Percentage range of top alignment score to report hits: 1
Opening the database...  [0.117s]
Database: [...]/biobakery3/uniref/uniref90_201901b_full.dmnd (type: Diamond database, sequences: 87296736, letters: 29247941583)
Block size = 2000000000
Opening the input file...  [0.06s]
Opening the output file...  [0s]
Loading query sequences...  [30.558s]
Masking queries...  [39.426s]
Algorithm: Double-indexed
Building query histograms...  [8.431s]
Allocating buffers...  [0s]
Loading reference sequences...  [6.507s]
Masking reference...  [31.677s]
Initializing dictionary...  [0.011s]
Initializing temporary storage...  [0.008s]
Building reference histograms...  [12.965s]
Allocating buffers...  [0s]
Processing query block 1, reference block 1/15, shape 1/2, index chunk 1/4.
Building reference seed array...  [9.151s]
Building query seed array...  [6.171s]
Computing hash join...  [5.577s]
Building seed filter...  [0.173s]
Searching alignments... 
[Dies here]

I’m running this on a cluster environment (Slurm) which however is neither running out of time nor reporting out-of-memory error. It’s possible that I’m missing a file-system error. However it definitely can’t be due to lack of storage space.

Any new ideas on that issue?

Best,
Len

Which version of DIAMOND are you running?