Biobakery Workflows: Errors with downloading database, "unpicking error", “topological_sort()" error, and "Unable to find Trimmomatic" error

Hi there! I hope you are well, and hope you might be able to help me with some errors. I am trying to install and run biobakery_workflows, but there are two problems that I’m facing.

Context (How I installed biobakery_workflows):

This is how I installed the biobakery workflow:

conda create -n biobakerywf -c biobakery biobakery_workflows

Problem 1: Trouble downloading and installing databases with biobakery_workflows_databases

When I first tried to install the databases with biobakery_workflows_databases, I received a tbb error:

error while loading shared libraries: cannot open shared object file: No such file or directory
(ERR): Description of arguments failed!

This was not too big of a deal, because it’s a known issue with bowtie2. I resolved this by downgrading tbb.:

conda install tbb=2020.2

After these changes, I again tried to install databases with:

biobakery_workflows_databases --install wmgx --location /home/bsingh/bin/biobakery_databases

I received the following error:

Installing humann utility mapping database
Download URL:
Downloading file of size: 2.55 GB

2.55 GB 100.00 %  10.70 MB/sec  0 min -0 sec         
Extracting: /home/bsingh/bin/biobakery_databases/humann/full_mapping_v201901.tar.gz

Database installed: /home/bsingh/bin/biobakery_databases/humann/utility_mapping

HUMAnN configuration file updated: database_folders : utility_mapping = /home/bsingh/bin/biobakery_databases/humann/utility_mapping
Generating strainphlan fasta database
Could not locate a Bowtie index corresponding to basename "/home/bsingh/miniconda3/envs/biobakerywf/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901"
Error: Encountered internal Bowtie 2 exception (#1)
Command: /home/bsingh/miniconda3/envs/biobakerywf/bin/bowtie2-inspect-s --wrapper basic-0 /home/bsingh/miniconda3/envs/biobakerywf/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901 
Unable to install database. Error running command: bowtie2-inspect /home/bsingh/miniconda3/envs/biobakerywf/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901 > /home/bsingh/bin/biobakery_databases/strainphlan_db_markers/all_markers.fasta

Any help would be appreciated!

Problem 2: "Unpickling Error"

When I try to even do biobakery_workflows wmgx --help, I get the following error:

  File "/home/bsingh/miniconda3/envs/biobakerywf/bin/", line 41, in <module>
    workflow = Workflow(version="0.1", description="A workflow for whole metagenome shotgun sequences")
  File "/home/bsingh/miniconda3/envs/biobakerywf/lib/python3.7/site-packages/anadama2/", line 120, in __init__
  File "/home/bsingh/miniconda3/envs/biobakerywf/lib/python3.7/site-packages/anadama2/", line 96, in __init__
  File "/home/bsingh/miniconda3/envs/biobakerywf/lib/python3.7/site-packages/anadama2/", line 325, in get_vars
    vars = pickle.load(open(pickle_file[0],"rb"))
_pickle.UnpicklingError: pickle data was truncated

I think this error is separate from the database download errors. For this one, I’m truly lost. Once again, any help would be appreciated!

Update: I was able to solve Problem 1 and Problem 2.

Problem 1 Solution: Prior to using biobakery_workflows_databases to download other databases, it’s best to manually download the MetaPhlan database. Somewhere on the MetaPhlan GitHub or tutorial, it says that if you’re downloading with Conda, you should download the databases in a custom location. However, if you do this, the biobakery_workflows_databases doesn’t know where to look for the MetaPhlan databases, and thinks they don’t exist. This leads to the error. So when you download the MetaPhlan databases, do it in the default location inside the Conda file structure.

Problem 2 Solution: This resolved itself one all the databases were downloaded.

Problem 3 Solution: There had been a third problem, where I was getting a “topological_sort()” error. As per this forum, I followed the solution and downgraded the networkx package to version 1.11.

Ultimately, this is what has worked so far to solve problems 1-3:

conda create -n biobakerywf -c biobakery biobakery_workflows
conda install tbb=2020.2
conda install networkx=1.11 
metaphlan --install #do not specify download location
biobakery_workflows_databases --install wmgx #do not specify download location

Problem 4:

However, I unfortunately now have another problem, where KneadData is unable to recognize that Trimmomatic is already downloaded, and I get the following error when I try to run the program:

Task 3 failed
  Name: kneaddata____HD42R4_subsample
  Original error: 
  Error executing action 0. Original Exception: 
  Traceback (most recent call last):
    File "/home/bsingh/miniconda3/envs/biobakerywf/lib/python3.7/site-packages/anadama2/", line 201, in _run_task_locally
    File "/home/bsingh/miniconda3/envs/biobakerywf/lib/python3.7/site-packages/anadama2/", line 89, in actually_sh
      ret = _sh(s, **kwargs)
    File "/home/bsingh/miniconda3/envs/biobakerywf/lib/python3.7/site-packages/anadama2/util/", line 320, in sh
      raise ShellException(proc.returncode, msg.format(cmd, ret[0], ret[1]))
  anadama2.util.ShellException: [Errno 1] Command `kneaddata --input /home/bsingh/biobakery_test_inputs/HD42R4_subsample.fastq.gz --output /home/bsingh/output_data/kneaddata/main --threads 1 --output-prefix HD42R4_subsample   --reference-db /home/bsingh/biobakery_workflows_databases/kneaddata_db_human_genome  --serial --run-trf  && mv /home/bsingh/output_data/kneaddata/main/HD42R4_subsample.repeats.removed.fastq /home/bsingh/output_data/kneaddata/main/HD42R4_subsample.fastq' failed. 
  Out: b''
  Err: b'ERROR: Unable to find trimmomatic. Please provide the full path to trimmomatic with --trimmomatic.\n

This same error was previously observed in this forum post, and also in this GitHub issue. A similar problem with trf was observed here.

As per the links above, I figured that the solution was to just specify the Trimmomatic path with --trimmomatic when running KneadData. However, since I’m using biobakery_workflows instead of just KneadData, I don’t think there is an explicit option to do that?

Problem 5:

I tried using KneadData by itself to make sure that there weren’t any other issues, but even when I specify with the Trimmomatic path, I get this error, which seems to generate from here, in line 278:

Decompressing gzipped file ...
Critical Error: Unable to gunzip input file: /home/bsingh/biobakery_test_inputs/HD42R4_subsample.fastq.gz
1 Like

I’m sorry for the multiple replies! As I’m finding solutions, I thought it’s better to just post it here in case anyone else finds it useful.

Problem 4 Solution: Fixed the “Unable to find trimmomatic” error. There is a gem of an option called --qc-options, where I was able to put the Trimmomatic path! Yay!

biobakery_workflows wmgx --input /home/bsingh/biobakery_test_inputs/ --output output_data --bypass-strain-profiling --qc-options "--trimmomatic /home/bsingh/bin/Trimmomatic-0.36".

Problem 5 Solution: File was corrupted! We’re all good! Thank you!

1 Like

Hi, I’m so sorry for making another post.

Problem 6:

I was able to run a wmgx workflow test with one full-sized paired-end metagenomic sample. However, when I tried wmgx_vis, I got the following error:

ImportError: cannot import name 'PwebProcessor'

I found over here that other people have also had this problem, and that they solved this by changing the Pweave version to 0.25. My version was 0.30.2. So I decided to do the same thing, and re-created my environment like this:

conda create -c biobakery -n biobakery python=3.6 biobakery_workflows tbb=2020.2 networkx=1.11 pweave=0.25 python-leveldb

I again tried to run wmgx_vis with the following command:

biobakery_workflows wmgx_vis --input /home/bsingh/output_data/ --project-name JSA10 --output output_vis_flow --format pdf

And I got this error in the log file:

  File "/home/bsingh/miniconda3/envs/biobakerywf/lib/python3.6/site-packages/anadama2/", line 201, in _run_task_locally
  File "/home/bsingh/miniconda3/envs/biobakerywf/lib/python3.6/site-packages/anadama2/", line 286, in create
  File "/home/bsingh/miniconda3/envs/biobakerywf/lib/python3.6/site-packages/pweave/", line 198, in weave
  File "/home/bsingh/miniconda3/envs/biobakerywf/lib/python3.6/site-packages/pweave/", line 149, in run
  File "/home/bsingh/miniconda3/envs/biobakerywf/lib/python3.6/site-packages/pweave/", line 65, in run
    self.executed = list(map(self._runcode, self.parsed))
  File "/home/bsingh/miniconda3/envs/biobakerywf/lib/python3.6/site-packages/pweave/", line 131, in _runcode
    chunk['content'] = self.loadinline(chunk['content'])
  File "/home/bsingh/miniconda3/envs/biobakerywf/lib/python3.6/site-packages/anadama2/", line 234, in loadinline
    result = self.loadstring(code_str).lstrip().replace("\n","",1)
  File "/home/bsingh/miniconda3/envs/biobakerywf/lib/python3.6/site-packages/pweave/", line 296, in loadstring
    exec(compiled, scope)
  File "chunk", line 1, in <module>
NameError: name 'caption' is not defined

The problem seems to be with Pweave again. I’m not sure which version I should be using?

Update: Fixed problem 6! Seemed to be more dependancy issues .

Problem 6 Solution:

This is the conda env that has worked me so far for wmgx and wmgx_vis:

conda env export --from-history > biobakerywl.yaml

  - biobakery
  - conda-forge
  - bioconda
  - defaults
  - tbb=2020.2
  - pweave=0.25
  - python=3.6
  - python-leveldb
  - networkx=1.11
  - biobakery_workflows
  - jupyter_client
  - pandoc
  - hclust2
  - latexcodec
  - r
  - r-vegan
1 Like