My goal is to download biobakery workflows reference demo databases used in the tutorial.
Upon first running the command:
biobakery_workflows_databases --install wmgx_demo, I received the error: “Encountered internal Bowtie 2 exception (#1)”. I followed the steps suggested in the last reply at the bottom of this post, namely:
- creating a new conda environment with python 3.7
- installing tbb version 2020.2, bioconda, and metaphlan in that environment
- Upon running the original command to install the biobakery workflow reference databases, I encountered a new error: ModuleNotFoundError: no module named ‘leveldb’
- To solve that, I installed python-leveldb into my environment as well
- In the end, I continue to get the same Bowtie 2 exception error as before.
Upon looking in the directory where it is reported to be looking for the database, I discovered that I have files which start with: “mpa_v31_CHOCOPhlan_201901” whereas the program is looking for files starting with “mpa_v30_CHOCOPhlan_201901”.
If that is indeed the case and biobakery workflows is expecting an older version of chocophlan, where can I obtain that file?
Thank you for your help!
I was able to solve this issue by running the following:
metaphlan --install --index mpa_v30_CHOCOPhlAn_201901 --bowtie2db <default path>,
But I have now received this error:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/../../biobakery_workflows/data/../../tutorial/kneaddata_demo_db/Homo_sapiens_demo'
I don’t know whether it would solve it or not, but perhaps running biobakery_worfklows wmgx with the following argument would help:
for example this is my command:
biobakery_workflows wmgx --input /path/to/input/dir --output /path/to/output/dir --qc-options=“–trimmomatic ~/miniconda3/envs/[CONDA ENV]/share/trimmomatic-0.39-2/ --reference-db /path/to/kneaddata_db/human_genome_bowtie2 --remove-intermediate-output” --local-jobs 3 --threads 10 --taxonomic-profiling-options="–bowtie2db=/path/to/metaphlan_db/v30 " --bypass-strain-profiling --pair-identifier _R1 --remove-intermediate-output
Thanks for your suggestion. Unfortunately, it does not work, as I am running the command (biobakery_workflows_databases) to install the biobakery workflows databases, whereas the command you’re referring to is to run biobakery workflows. My understanding is that the command I am running will install the reference databases, which your command points to (i.e., I need to obtain a path to those by installing them first).
Ah, then perhaps the solution is easier, these are the data locations and you can download them manually:
Download everything that’s related to a certain version and put in a folder.
Reference to the folder using the
--bowtie2db argument in metaphlan’s options when running metaphlan/a biobakery workflow.
You can also specify which version of database to use when running a workflow/metaphlan using the argument --index (again, metaphlan’s options). For example:
--index=mpa_v30_CHOCOPhlAn_201901, so a new version won’t download automatically.
You can also try and install the markers’ DB using the command:
metaphlan --install --bowtie2db [LOCATION YOU WANT TO DOWNLOAD TO]
Download the nucleotide DB ChocoPhlAn, the protein DB UniRef and utility mapping that match the version of the DB of the MetaPhlAn you have downloaded.
You can also use the command:
humann_databases --download [Additional arguments]
Update the location of the DBs you have downloaded using the
humann_config --update command.