The bioBakery help forum

Error installing wmgx: biobakery_workflows_databases --install wmgx

Problem 1:
Ran the commands:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --add channels biobakery
conda install -c biobakery biobakery_workflows
biobakery_workflows_databases --install wmgx

Got the error:

ImportError: cannot import name ‘complete_to_chordal_graph’ from partially initialized module ‘networkx.algorithms’ (most likely due to a circular import)

Tried to follow the instructions here:
https://forum.biobakery.org/t/biobakery-workflows-wmgx-typeerror/1658/5
The solution suggested was - uninstalling netwrokx and reinstalling it.
In general, uninstalling any package caused me problems as I needed to reinstall biobakery_workflows.
Anyway, that was not the solution for me (didn’t work with any version of networkx).

Solution 1:
Creating a new environment in conda with python 3.7 and not the latest python version which is currently 3.9.


Problem 2:

I downloaded wmgx demo database to try and work on the given example inputs:

biobakery_workflows_databases --install wmgx_demo
biobakery_workflows wmgx --input input --output output_data

Got the error:

task_idxs = nx.algorithms.dag.topological_sort(self.dag, reverse=True)
TypeError: topological_sort() got an unexpected keyword argument ‘reverse’

By this thread’s advice I set networkx to a certain version

Solution 2.a:

conda install networkx=1.11

removing networkx and then reinstalling like this:

conda remove -n huttenhower-py3.7 networkx
conda install networkx=1.11

Forced me to reinstall biobakery_worklflows again:

conda install -c biobakery biobakery_workflows


Ran again the command:

biobakery_workflows wmgx --input input --output output_data

Got the error:

Err: b’ERROR: Unable to find trimmomatic. Please provide the full path to trimmomatic with --trimmomatic.\n’

With the help of this thread figured I needed to point to where trimmoatic*.jar file is after I got another error message like this:

Err: b’ERROR: The trimmomatic*.jar executable is not included in the directory: ~/miniconda3/envs/$MY_ENV/bin/trimmomatic-0.39\n’

Solution 2.b:
Found the location of trimmomatic*.jar:

find -name * trimmomatic*.jar

Then ran the command:

biobakery_workflows wmgx --input input --output output_data --qc-options="–trimmomatic ~/miniconda3/envs/$MY_ENV/share/trimmomatic-0.39-2/"


Problem 3
After fixing the problems above I ran the command:

biobakery_workflows wmgx --input input --output output_data --qc-options="–trimmomatic ./miniconda3/envs/$MY_ENV/share/trimmomatic-0.39-2/"

Which brought me to the error message:

Err: b’ERROR: Unable to find bowtie2 index files in directory: ~/biobakery_workflows_databases/kneaddata_db_human_genome\n’

Following kneaddata’s tutorial, I thought of downloading

So I tried downloading the database by doing:

kneaddata_database --download human_genome bowtie2 ~/biobakery_workflows_databases/kneaddata_db_human_genome

My run failed but will update later how I tried to solve the failure.

Err: b’ERROR: You are using the demo ChocoPhlAn database with a non-demo input file. If you have not already done so, please run humann_databases to download the full ChocoPhlAn database. If you have downloaded the full database, use the option --nucleotide-database to provide the location. You can also run humann_config to update the default database location. For additional information, please see the HUMAnN User Manual.\n’

in\nWARNING: Can not call software version for bowtie2\n\n’
Err: b’CRITICAL ERROR: Please update diamond from version 0.9.24 to version 0.9.36\n\n’

I followed this tutorial:

and not this:

I am not yet worried though, as I presume I did not provide the correct input files (I used the files from here) and I need to download more databases. My main issue is problem 4


Problem 4

Meanwhile, I decided to download the full wmgx database.

biobakery_workflows_databases --install wmgx

Got:

error while loading shared libraries: libtbb.so.2: cannot open shared object file: No such file or directory

Solution 4.a:

Following this thread, I did:

conda install tbb=2020.2


Ran again:

biobakery_workflows_databases --install wmgx

Got:

Could not locate a Bowtie index corresponding to basename “~/miniconda3/envs/$MY_ENV/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901”
Error: Encountered internal Bowtie 2 exception (#1)

Solution 4.b
Did:

metaphlan --install --bowtie2db $DIR

Even though it was suggested that database should not be put inside the environment, I could not overcome this and had to put it at:

~/miniconda3/envs/$MY_ENV/lib/python3.7/site-packages/metaphlan/metaphlan_databases/


Ran again:

biobakery_workflows_databases --install wmgx

Got the error (when files were being extracted after they were downloaded):

File “~/miniconda3/envs/$MY_ENV/lib/python3.7/http/client.py”, line 292, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: -Òì=û)¢ #ÌüE¤VdQ.Ì3Opl«§Ò+inK-n~õãÂFÂjzĤ3©õìV/¡¿I
<ÞQUº\ÙçF{ùÙÓÇ«üÆóß¼U»$ »
¥$
¥¢VVÂ}Mº
Ê®¦·¦èxêßVJC¢§L%`+ûv¹âI ÃÕ;R�«JqÄZéÖÂpå~víúPÝÕÉR+ûáñ¿??|zÂ.

Now this is something I could not overcome alone.

Continuation of Problem 3 and Problem 4:

To solve part of problem 3 I installed the required version of diamond:

conda install diamond==0.9.36

Then both problems 3 and 4 lead to the same problem which is apparently installation of ChocoPhlAn:

Unable to install database. Error running command: humann_databases --download chocophlan full ~/biobakery_workflows_databases/humann
($MY_ENV) [comp_name]$ humann_databases --download chocophlan full ~/biobakery_workflows_databases/humann
Download URL: http://huttenhower.sph.harvard.edu/humann_data/chocophlan/full_chocophlan.v296_201901b.tar.gz
Traceback (most recent call last):
  File "~/miniconda3/envs/$MY_ENV/bin/humann_databases", line 10, in <module>
    sys.exit(main())
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/site-packages/humann/tools/humann_databases.py", line 173, in main
    install_location=download_database(database,build,location,args.database_location)
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/site-packages/humann/tools/humann_databases.py", line 111, in download_database
    downloaded_file, install_location)
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/site-packages/humann/utilities.py", line 520, in download_tar_and_extract_with_progress_messages
    url_handle = urlretrieve(url, filename, reporthook=ReportHook().report)
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/urllib/request.py", line 543, in _open
    '_open', req)
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/urllib/request.py", line 1378, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/urllib/request.py", line 1353, in do_open
    r = h.getresponse()
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/http/client.py", line 1369, in getresponse
    response.begin()
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/http/client.py", line 310, in begin
    version, status, reason = self._read_status()
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/http/client.py", line 292, in _read_status
    raise BadStatusLine(line)
http.client.BadStatusLine: -Òì=û)¢	#ÌüE¤VdQ.Ì3Opl«§Ò+inK-n~õãÂFÂjzĤ3©õìV/¡¿I
  ><ÞQUº\ÙçF{ùÙÓÇ«üÆóß¼U»$	»	
                                        ¥$
                                          ¥¢VVÂ}Mº
                                                  Ê®¦·¦èxêßVJC¢§L%`+ûv¹âI ÃÕ;R�«JqÄZéÖÂpå~víúPÝÕÉR+ûáñ¿??|zÂ.

I have tried different methods of downloading ChocoPhlAn.

humann_databases --download chocophlan full ./biobakery_workflows_databases/humann/chocophlan

(Got the same error message above).
Then I tried to download the database directly using the URL:
http://huttenhower.sph.harvard.edu/humann_data/chocophlan/full_chocophlan.v296_201901b.tar.gz

It downloaded about 17.2 GBs but then the installation stopped.
I then tried to do:

wget http://huttenhower.sph.harvard.edu/humann_data/chocophlan/full_chocophlan.v296_201901b.tar.gz

The terminal shows it downloaded 17.2 GBs but the actual file size is 4.3GB.
Just wanted to update, will check if the 17.2 file is usable.

Updating again.
Downloading via wget did download a file of the size 17.2GB.
It doesn’t matter though because no matter what I do, it doesn’t seem like the file can be untar-ed.
For example:

tar -zxvf full_chocophlan.v296_201901b.tar.gz
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

When I tried to look how the file was used inside the code humann/utilities, I see that the file should indeed be extracted (the code looks for the file format: ‘^[g__][s__]’).
I used the same code as in the humann/utilities in python:

import tarfile
filename = r'~/full_chocophlan.v296_201901b.tar.gz'
tarfile_handle=tarfile.open(filename)
raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully
# tried:
tarfile_handle=tarfile.open(filename, 'r:gz')
raise ReadError("not a gzip file")
tarfile.ReadError: not a gzip file

So I figured it might be a problem with retrieving the file itself.
I ran:

from urllib.request import urlretrieve
url = "http://huttenhower.sph.harvard.edu/humann_data/chocophlan/full_chocophlan.v296_201901b.tar.gz"
# Copied the ReportHook class as defined in the code. I don't show it here.
url_handle = urlretrieve(url, filename, reporthook=ReportHook().report)

I get the same error message as above.
The same happens if I try to do this:

from urllib.request import urlopen
f = urlopen(url)

Basically, everything leads to the same problem where I cannot retrieve ChocoPhlAn.

Update: Eventually, I downloaded ChocoPhlAn directly from the url on Windows computer and transferred the files to my designated computer.
This time, the file contained the required data and I was able to extract the data. I do wonder if it’s something related to security issues of the network I am connected to in my lab.

Solution?
So I tried to trace back the issue.
Apparently, the problem lies within a python code called client.py which is part of the http package.
If installed through conda, it can be found in:
~/miniconda3/envs/$MY_ENV/lib/python3.7/http/client.py

Change line 271:

line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")

to:

line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-2")

For some reason, the header of the http response needs to be decoded using the iso-8859-2 encoding and not iso-8859-1. I have no idea why. I know it’s not related to biobakery_workflows, but I still thought it would be important to provide the solution as someone else might stumble upon the same problem I did.


I also had some problems with my locale which might be related.
For documentation purposes, I got the error:

  Name: humann_count_alignments_species
  Original error: 
  Error executing action 0. Original Exception: 
  Traceback (most recent call last):
    File "~/miniconda3/envs/$MY_ENV/lib/python3.7/site-packages/anadama2/runners.py", line 201, in _run_task_locally
      action_func(task)
    File "~/miniconda3/envs/$MY_ENV/lib/python3.7/site-packages/anadama2/helpers.py", line 89, in actually_sh
      ret = _sh(s, **kwargs)
    File "~/miniconda3/envs/$MY_ENV/lib/python3.7/site-packages/anadama2/util/__init__.py", line 320, in sh
      raise ShellException(proc.returncode, msg.format(cmd, ret[0], ret[1]))
  anadama2.util.ShellException: [Errno 1] Command `get_counts_from_humann_logs.py --input ~/output_data/humann/main --output /pita/users/hila/output_data/humann/counts/humann_read_and_species_count_table.tsv' failed. 
  Out: b''
  Err: b'Traceback (most recent call last):\n  File "~/miniconda3/envs/$MY_ENV/bin/get_counts_from_humann_logs.py", line 79, in <module>\n    main()\n  File "~/miniconda3/envs/$MY_ENV/bin/get_counts_from_humann_logs.py", line 59, in main\n    data[1]=int(line.split()[7][2:])\nValueError: invalid literal for int() with base 10: \'perl:\'\n'

I got this because biobakery_workflows uses perl. The output from perl, a log file, is assigned to the variable line. Location line.split()[7][2:] doesn’t indicate what you would like it to indicate if there are problems with the locale. As in, the script tries to:

…create a table of reads from humann log files. The table will have
total reads, unaligned after nucleotide search, and unaligned after translated search.
The table will also include the total species number.

So line.split()[7][2:] tries to get the total number of reads but if there are problems with the locale, perl might output some extra characters into the log file which are unexpected.

I eventually solved this problem, following:

It was hard solving that though, as I do not have sudo access.

I would suggest using regex instead in future updates.