Error installing wmgx: biobakery_workflows_databases --install wmgx

Problem 1:
Ran the commands:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --add channels biobakery
conda install -c biobakery biobakery_workflows
biobakery_workflows_databases --install wmgx

Got the error:

ImportError: cannot import name ‘complete_to_chordal_graph’ from partially initialized module ‘networkx.algorithms’ (most likely due to a circular import)

Tried to follow the instructions here:
https://forum.biobakery.org/t/biobakery-workflows-wmgx-typeerror/1658/5
The solution suggested was - uninstalling netwrokx and reinstalling it.
In general, uninstalling any package caused me problems as I needed to reinstall biobakery_workflows.
Anyway, that was not the solution for me (didn’t work with any version of networkx).

Solution 1:
Creating a new environment in conda with python 3.7 and not the latest python version which is currently 3.9.


Problem 2:

I downloaded wmgx demo database to try and work on the given example inputs:

biobakery_workflows_databases --install wmgx_demo
biobakery_workflows wmgx --input input --output output_data

Got the error:

task_idxs = nx.algorithms.dag.topological_sort(self.dag, reverse=True)
TypeError: topological_sort() got an unexpected keyword argument ‘reverse’

By this thread’s advice I set networkx to a certain version

Solution 2.a:

conda install networkx=1.11

removing networkx and then reinstalling like this:

conda remove -n huttenhower-py3.7 networkx
conda install networkx=1.11

Forced me to reinstall biobakery_worklflows again:

conda install -c biobakery biobakery_workflows


Ran again the command:

biobakery_workflows wmgx --input input --output output_data

Got the error:

Err: b’ERROR: Unable to find trimmomatic. Please provide the full path to trimmomatic with --trimmomatic.\n’

With the help of this thread figured I needed to point to where trimmoatic*.jar file is after I got another error message like this:

Err: b’ERROR: The trimmomatic*.jar executable is not included in the directory: ~/miniconda3/envs/$MY_ENV/bin/trimmomatic-0.39\n’

Solution 2.b:
Found the location of trimmomatic*.jar:

find -name * trimmomatic*.jar

Then ran the command:

biobakery_workflows wmgx --input input --output output_data --qc-options=“–trimmomatic ~/miniconda3/envs/$MY_ENV/share/trimmomatic-0.39-2/”


Problem 3
After fixing the problems above I ran the command:

biobakery_workflows wmgx --input input --output output_data --qc-options=“–trimmomatic ./miniconda3/envs/$MY_ENV/share/trimmomatic-0.39-2/”

Which brought me to the error message:

Err: b’ERROR: Unable to find bowtie2 index files in directory: ~/biobakery_workflows_databases/kneaddata_db_human_genome\n’

Following kneaddata’s tutorial, I thought of downloading

So I tried downloading the database by doing:

kneaddata_database --download human_genome bowtie2 ~/biobakery_workflows_databases/kneaddata_db_human_genome

My run failed but will update later how I tried to solve the failure.

Err: b’ERROR: You are using the demo ChocoPhlAn database with a non-demo input file. If you have not already done so, please run humann_databases to download the full ChocoPhlAn database. If you have downloaded the full database, use the option --nucleotide-database to provide the location. You can also run humann_config to update the default database location. For additional information, please see the HUMAnN User Manual.\n’

in\nWARNING: Can not call software version for bowtie2\n\n’
Err: b’CRITICAL ERROR: Please update diamond from version 0.9.24 to version 0.9.36\n\n’

I followed this tutorial:

and not this:

I am not yet worried though, as I presume I did not provide the correct input files (I used the files from here) and I need to download more databases. My main issue is problem 4


Problem 4

Meanwhile, I decided to download the full wmgx database.

biobakery_workflows_databases --install wmgx

Got:

error while loading shared libraries: libtbb.so.2: cannot open shared object file: No such file or directory

Solution 4.a:

Following this thread, I did:

conda install tbb=2020.2


Ran again:

biobakery_workflows_databases --install wmgx

Got:

Could not locate a Bowtie index corresponding to basename “~/miniconda3/envs/$MY_ENV/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901”
Error: Encountered internal Bowtie 2 exception (#1)

Solution 4.b
Did:

metaphlan --install --bowtie2db $DIR

Even though it was suggested that database should not be put inside the environment, I could not overcome this and had to put it at:

~/miniconda3/envs/$MY_ENV/lib/python3.7/site-packages/metaphlan/metaphlan_databases/


Ran again:

biobakery_workflows_databases --install wmgx

Got the error (when files were being extracted after they were downloaded):

File “~/miniconda3/envs/$MY_ENV/lib/python3.7/http/client.py”, line 292, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: -Òì=û)¢ #ÌüE¤VdQ.Ì3Opl«§Ò+inK-n~õãÂFÂjzĤ3©õìV/¡¿I
<ÞQUº\ÙçF{ùÙÓÇ«üÆóß¼U»$ »
¥$
¥¢VVÂ}Mº
Ê®¦·¦èxêßVJC¢§L%`+ûv¹âI ÃÕ;R�«JqÄZéÖÂpå~víúPÝÕÉR+ûáñ¿??|zÂ.

Now this is something I could not overcome alone.

Continuation of Problem 3 and Problem 4:

To solve part of problem 3 I installed the required version of diamond:

conda install diamond==0.9.36

Then both problems 3 and 4 lead to the same problem which is apparently installation of ChocoPhlAn:

Unable to install database. Error running command: humann_databases --download chocophlan full ~/biobakery_workflows_databases/humann
($MY_ENV) [comp_name]$ humann_databases --download chocophlan full ~/biobakery_workflows_databases/humann
Download URL: http://huttenhower.sph.harvard.edu/humann_data/chocophlan/full_chocophlan.v296_201901b.tar.gz
Traceback (most recent call last):
  File "~/miniconda3/envs/$MY_ENV/bin/humann_databases", line 10, in <module>
    sys.exit(main())
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/site-packages/humann/tools/humann_databases.py", line 173, in main
    install_location=download_database(database,build,location,args.database_location)
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/site-packages/humann/tools/humann_databases.py", line 111, in download_database
    downloaded_file, install_location)
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/site-packages/humann/utilities.py", line 520, in download_tar_and_extract_with_progress_messages
    url_handle = urlretrieve(url, filename, reporthook=ReportHook().report)
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/urllib/request.py", line 543, in _open
    '_open', req)
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/urllib/request.py", line 1378, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/urllib/request.py", line 1353, in do_open
    r = h.getresponse()
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/http/client.py", line 1369, in getresponse
    response.begin()
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/http/client.py", line 310, in begin
    version, status, reason = self._read_status()
  File "~/miniconda3/envs/$MY_ENV/lib/python3.7/http/client.py", line 292, in _read_status
    raise BadStatusLine(line)
http.client.BadStatusLine: -Òì=û)¢	#ÌüE¤VdQ.Ì3Opl«§Ò+inK-n~õãÂFÂjzĤ3©õìV/¡¿I
  ><ÞQUº\ÙçF{ùÙÓÇ«üÆóß¼U»$	»	
                                        ¥$
                                          ¥¢VVÂ}Mº
                                                  Ê®¦·¦èxêßVJC¢§L%`+ûv¹âI ÃÕ;R�«JqÄZéÖÂpå~víúPÝÕÉR+ûáñ¿??|zÂ.

I have tried different methods of downloading ChocoPhlAn.

humann_databases --download chocophlan full ./biobakery_workflows_databases/humann/chocophlan

(Got the same error message above).
Then I tried to download the database directly using the URL:
http://huttenhower.sph.harvard.edu/humann_data/chocophlan/full_chocophlan.v296_201901b.tar.gz

It downloaded about 17.2 GBs but then the installation stopped.
I then tried to do:

wget http://huttenhower.sph.harvard.edu/humann_data/chocophlan/full_chocophlan.v296_201901b.tar.gz

The terminal shows it downloaded 17.2 GBs but the actual file size is 4.3GB.
Just wanted to update, will check if the 17.2 file is usable.

Updating again.
Downloading via wget did download a file of the size 17.2GB.
It doesn’t matter though because no matter what I do, it doesn’t seem like the file can be untar-ed.
For example:

tar -zxvf full_chocophlan.v296_201901b.tar.gz
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

When I tried to look how the file was used inside the code humann/utilities, I see that the file should indeed be extracted (the code looks for the file format: ‘[1][s__]’).
I used the same code as in the humann/utilities in python:

import tarfile
filename = r'~/full_chocophlan.v296_201901b.tar.gz'
tarfile_handle=tarfile.open(filename)
raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully
# tried:
tarfile_handle=tarfile.open(filename, 'r:gz')
raise ReadError("not a gzip file")
tarfile.ReadError: not a gzip file

So I figured it might be a problem with retrieving the file itself.
I ran:

from urllib.request import urlretrieve
url = "http://huttenhower.sph.harvard.edu/humann_data/chocophlan/full_chocophlan.v296_201901b.tar.gz"
# Copied the ReportHook class as defined in the code. I don't show it here.
url_handle = urlretrieve(url, filename, reporthook=ReportHook().report)

I get the same error message as above.
The same happens if I try to do this:

from urllib.request import urlopen
f = urlopen(url)

Basically, everything leads to the same problem where I cannot retrieve ChocoPhlAn.


  1. g__ ↩︎

Update: Eventually, I downloaded ChocoPhlAn directly from the url on Windows computer and transferred the files to my designated computer.
This time, the file contained the required data and I was able to extract the data. I do wonder if it’s something related to security issues of the network I am connected to in my lab.

Solution?
So I tried to trace back the issue.
Apparently, the problem lies within a python code called client.py which is part of the http package.
If installed through conda, it can be found in:
~/miniconda3/envs/$MY_ENV/lib/python3.7/http/client.py

Change line 271:

line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")

to:

line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-2")

For some reason, the header of the http response needs to be decoded using the iso-8859-2 encoding and not iso-8859-1. I have no idea why. I know it’s not related to biobakery_workflows, but I still thought it would be important to provide the solution as someone else might stumble upon the same problem I did.


I also had some problems with my locale which might be related.
For documentation purposes, I got the error:

  Name: humann_count_alignments_species
  Original error: 
  Error executing action 0. Original Exception: 
  Traceback (most recent call last):
    File "~/miniconda3/envs/$MY_ENV/lib/python3.7/site-packages/anadama2/runners.py", line 201, in _run_task_locally
      action_func(task)
    File "~/miniconda3/envs/$MY_ENV/lib/python3.7/site-packages/anadama2/helpers.py", line 89, in actually_sh
      ret = _sh(s, **kwargs)
    File "~/miniconda3/envs/$MY_ENV/lib/python3.7/site-packages/anadama2/util/__init__.py", line 320, in sh
      raise ShellException(proc.returncode, msg.format(cmd, ret[0], ret[1]))
  anadama2.util.ShellException: [Errno 1] Command `get_counts_from_humann_logs.py --input ~/output_data/humann/main --output /pita/users/hila/output_data/humann/counts/humann_read_and_species_count_table.tsv' failed. 
  Out: b''
  Err: b'Traceback (most recent call last):\n  File "~/miniconda3/envs/$MY_ENV/bin/get_counts_from_humann_logs.py", line 79, in <module>\n    main()\n  File "~/miniconda3/envs/$MY_ENV/bin/get_counts_from_humann_logs.py", line 59, in main\n    data[1]=int(line.split()[7][2:])\nValueError: invalid literal for int() with base 10: \'perl:\'\n'

I got this because biobakery_workflows uses perl. The output from perl, a log file, is assigned to the variable line. Location line.split()[7][2:] doesn’t indicate what you would like it to indicate if there are problems with the locale. As in, the script tries to:

…create a table of reads from humann log files. The table will have
total reads, unaligned after nucleotide search, and unaligned after translated search.
The table will also include the total species number.

So line.split()[7][2:] tries to get the total number of reads but if there are problems with the locale, perl might output some extra characters into the log file which are unexpected.

I eventually solved this problem, following:

It was hard solving that though, as I do not have sudo access.

I would suggest using regex instead in future updates.

My whole workflow to successfully install the wmgx workflow:

conda create --name workflow env python=3.7

conda install tbb=2020.2

conda install -c bioconda metaphlan

metphalan --install --nproc 8

biobakery_workflows_databases --install wmgx

This addresses the Bowtie index basename error message.

1 Like

Thanks. this post and this workflow have been very helpful.

But I’m running into one additional problem.

after running this workflow,
the error message I encountered :

Could not locate a Bowtie index corresponding to basename "/home/hjy/miniconda3/envs/biobakery/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901"
Error: Encountered internal Bowtie 2 exception (#1)
Command: /home/hjy/miniconda3/envs/biobakery/bin/bowtie2-inspect-s --wrapper basic-0 /home/hjy/miniconda3/envs/biobakery/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901
Unable to install database. Error running command: bowtie2-inspect /home/hjy/miniconda3/envs/biobakery/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901 > /home/hjy/biobakery_workflows_databases/strainphlan_db_markers/all_markers.fasta

and my metaphlan_database directory :

~/miniconda3/envs/biobakery/lib/python3.7/site-packages/metaphlan/metaphlan_databases$ ll
total 3366472
drwxr-xr-x 2 hjy hjy      4096  8월 16 22:29 ./
drwxr-xr-x 5 hjy hjy      4096  8월 16 21:52 ../
-rw-r--r-- 1 hjy hjy        26  8월 16 21:56 mpa_latest
-rw-rw-r-- 1 hjy hjy 761608923  8월 16 22:08 mpa_v31_CHOCOPhlAn_201901.1.bt2
-rw-rw-r-- 1 hjy hjy 352153900  8월 16 22:08 mpa_v31_CHOCOPhlAn_201901.2.bt2
-rw-rw-r-- 1 hjy hjy  12201056  8월 16 21:59 mpa_v31_CHOCOPhlAn_201901.3.bt2
-rw-rw-r-- 1 hjy hjy 352153894  8월 16 21:59 mpa_v31_CHOCOPhlAn_201901.4.bt2
-rw-rw-r-- 1 hjy hjy 397302846  2월 25 02:42 mpa_v31_CHOCOPhlAn_201901.fna.bz2
-rw-r--r-- 1 hjy hjy        64  8월 16 21:58 mpa_v31_CHOCOPhlAn_201901.md5
-rw-rw-r-- 1 hjy hjy  30354333  2월 25 02:33 mpa_v31_CHOCOPhlAn_201901.pkl
-rw-rw-r-- 1 hjy hjy 761608923  8월 16 22:17 mpa_v31_CHOCOPhlAn_201901.rev.1.bt2
-rw-rw-r-- 1 hjy hjy 352153900  8월 16 22:17 mpa_v31_CHOCOPhlAn_201901.rev.2.bt2
-rw-r--r-- 1 hjy hjy 427663360  8월 16 21:58 mpa_v31_CHOCOPhlAn_201901.tar
-rw-rw-r-- 2 hjy hjy        50  7월 26 16:35 README.txt

It seems that mpa_latest version of the installed database (v31) and the version required by bowtie2 (v30) do not match each other.

I found the “mpa_v30_CHOCOPhlAn_201901.tar” file in http://cmprod1.cibio.unitn.it/databases/MetaPhlAn/metaphlan_databases/ and tried to install it,
but I’m not sure if the database can be re-builded without issue just by unzipping the file.

Can you please tell me how I can successfully re-build the metaphlan database?

I have stumbled upon a similar error recently. You don’t need to unzip the file, just leave it as it is.
Your folder is similar to mine, just leave everything in one folder.

biobakery_workflows checks for the latest MetPhlAn’s database automatically and downloads it if there is an update (this can create problems actually).
You can specify which version to use and where MetaPhlAn’s markers’ DB folder is located using the arguments: --index and --bowtie2db of the --taxonomic-profiling-options in the biobakery_workflows wmgx command.
For example:

biobakery_workflows wmgx --input /path/to/input/dir --output /path/to/output/dir --qc-options=“–trimmomatic ~/miniconda3/envs/[CONDA ENV]/share/trimmomatic-0.39-2/ --reference-db /path/to/kneaddata_db/human_genome_bowtie2 --remove-intermediate-output” --local-jobs 3 --threads 10 –taxonomic-profiling-options=“–bowtie2db=/path/to/metaphlan_db/v30 --index=mpa_v30_CHOCOPhlAn_201901” --bypass-strain-profiling --pair-identifier _R1 --remove-intermediate-output

1 Like

Your advice has solved most of my issues.
really appreciate. :laughing:

Hi @ adler-sudo
I tried to follow this and install the biobakery workflow but got an error
conda create --name workflow env python=3.7
Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  • env

Current channels:

To search for alternate channels that may provide the conda package you’re
looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page.

Kindly suggest