Biobakery Workflows: Errors with downloading database, "unpicking error", “topological_sort()" error, and "Unable to find Trimmomatic" error

Update: I was able to solve Problem 1 and Problem 2.

Problem 1 Solution: Prior to using biobakery_workflows_databases to download other databases, it’s best to manually download the MetaPhlan database. Somewhere on the MetaPhlan GitHub or tutorial, it says that if you’re downloading with Conda, you should download the databases in a custom location. However, if you do this, the biobakery_workflows_databases doesn’t know where to look for the MetaPhlan databases, and thinks they don’t exist. This leads to the error. So when you download the MetaPhlan databases, do it in the default location inside the Conda file structure.

Problem 2 Solution: This resolved itself one all the databases were downloaded.

Problem 3 Solution: There had been a third problem, where I was getting a “topological_sort()” error. As per this forum, I followed the solution and downgraded the networkx package to version 1.11.

Ultimately, this is what has worked so far to solve problems 1-3:

conda create -n biobakerywf -c biobakery biobakery_workflows
conda install tbb=2020.2
conda install networkx=1.11 
metaphlan --install #do not specify download location
biobakery_workflows_databases --install wmgx #do not specify download location

Problem 4:

However, I unfortunately now have another problem, where KneadData is unable to recognize that Trimmomatic is already downloaded, and I get the following error when I try to run the program:

Task 3 failed
  Name: kneaddata____HD42R4_subsample
  Original error: 
  Error executing action 0. Original Exception: 
  Traceback (most recent call last):
    File "/home/bsingh/miniconda3/envs/biobakerywf/lib/python3.7/site-packages/anadama2/runners.py", line 201, in _run_task_locally
      action_func(task)
    File "/home/bsingh/miniconda3/envs/biobakerywf/lib/python3.7/site-packages/anadama2/helpers.py", line 89, in actually_sh
      ret = _sh(s, **kwargs)
    File "/home/bsingh/miniconda3/envs/biobakerywf/lib/python3.7/site-packages/anadama2/util/__init__.py", line 320, in sh
      raise ShellException(proc.returncode, msg.format(cmd, ret[0], ret[1]))
  anadama2.util.ShellException: [Errno 1] Command `kneaddata --input /home/bsingh/biobakery_test_inputs/HD42R4_subsample.fastq.gz --output /home/bsingh/output_data/kneaddata/main --threads 1 --output-prefix HD42R4_subsample   --reference-db /home/bsingh/biobakery_workflows_databases/kneaddata_db_human_genome  --serial --run-trf  && mv /home/bsingh/output_data/kneaddata/main/HD42R4_subsample.repeats.removed.fastq /home/bsingh/output_data/kneaddata/main/HD42R4_subsample.fastq' failed. 
  Out: b''
  Err: b'ERROR: Unable to find trimmomatic. Please provide the full path to trimmomatic with --trimmomatic.\n

This same error was previously observed in this forum post, and also in this GitHub issue. A similar problem with trf was observed here.

As per the links above, I figured that the solution was to just specify the Trimmomatic path with --trimmomatic when running KneadData. However, since I’m using biobakery_workflows instead of just KneadData, I don’t think there is an explicit option to do that?

Problem 5:

I tried using KneadData by itself to make sure that there weren’t any other issues, but even when I specify with the Trimmomatic path, I get this error, which seems to generate from here, in line 278:

Decompressing gzipped file ...
Critical Error: Unable to gunzip input file: /home/bsingh/biobakery_test_inputs/HD42R4_subsample.fastq.gz
2 Likes