Alternate kneaddata genomes in biobakery workflows

okeydokey · October 19, 2020, 9:01pm

It appears like the kneaddata step of the wmgx or wmgx_wmtx workflows can’t be run against non-human databases, is that correct? If I use the --qc-options="–reference-db /location/" to specify the mouse database instead of the human database (like I would using kneaddata) I get an error that says:

"Unable to find database KNEADDATA_DB_HUMAN_GENOME. This is the KneadData bowtie2 database of the human genome. This database can be downloaded with Knead Data. Unable to find in default install folders or with environment variable.

I also get an error if I use a workflows environmental variable to specify the mouse genome instead of using the kneaddata variable.

Is there anyway to run these pipelines using the mouse genome in the KneadData step?

lauren.j.mciver · October 19, 2020, 9:16pm

Hello - You should be able to use non-human reference databases with the workflow. We have used this option for our own runs when needed. Sorry to hear you are running into errors. If you add the option --contaminate-databases <folder> when running the workflows you can specify the mouse genome location. Alternatively you should be able to change the environment variable too. Please try adding the database option when running and follow up if you continue to have issues.

Thank you,
Lauren

okeydokey · October 20, 2020, 7:14pm

Hi and thanks for the quick reply. I tried that with the following command:

biobakery_workflows wmgx_wmtx --dry-run --contaminate-databases /kneaddata/databases/mouse/ --threads 48 --input-metagenome /metagenomics/ --input-metatranscriptome /metatranscriptomics/ --output /workflows_output/

Here was my error:

wmgx_wmtx.py: error: unrecognized arguments: --contaminate-databases /kneaddata/databases/mouse/

lauren.j.mciver · October 20, 2020, 10:51pm

Hi - Thanks for the follow up and sorry for any confusion. The wmgx workflow is the only one with the --contaminate-databases option. The wmgx_wmtx workflow does not have the database option in part because we set different databases for each input type (wmgx and wmtx). We could add a set of custom database options to the workflow and will look at adding this in a future release. For now you should be able to set the following environment variables to provide custom databases:

$KNEADDATA_DB_HUMAN_GENOME
$KNEADDATA_DB_HUMAN_TRANSCRIPTOME
$KNEADDATA_DB_RIBOSOMAL_RNA

The workflow will run the first database on all DNA samples and all databases on all RNA samples.

Thank you,
Lauren

wanghaihua-hub · October 22, 2020, 4:11pm

Hi, I used the following command “biobakery_workflows wmgx --contaminate-databases /apps/users/user01/wanghhh/metagenomic/databases/kneaddata_database --input rawdata --output workflow_output”, but still have the error like “Unable to find database KNEADDATA_DB_HUMAN_GENOME. This is the KneadData bowtie2 database of the human genome. This database can be downloaded with KneadData. Unable to find in default install folders or with environment variables.”
So, how could I fix it? thank you~~

lauren.j.mciver · October 22, 2020, 6:33pm

Hello - Thank you for the detailed post. If you set the environment variable $KNEADDATA_DB_HUMAN_GENOME it will resolve the error you are seeing. You can set it to any database it just needs to be set as a part of the initial workflow installation.

Thank you,
Lauren

lzh1982 · November 19, 2020, 11:54am

Hi，Would you tell me how to set the environment variables? for example, I have downloaded the corresponding databases in /lizhihua/biobakery_workflows/kneaddata_db_human_genome. then, $KNEADDATA_DB_HUMAN_GENOME=/lizhihua/biobakery_workflows/kneaddata_db_human_genome? Thank you very much!

lauren.j.mciver · November 24, 2020, 9:15pm

Hello - It would depend on your default shell. If you are running in bash you would run:
$ export KNEADDATA_DB_HUMAN_GENOME=/lizhihua/biobakery_workflows/kneaddata_db_human_genome

Thank you,
Lauren

Topic		Replies	Views
Biobakery Workflow database bioBakery workflows	3	807	August 30, 2023
Manual Installation of kneaddata_db_human_genome bioBakery workflows	2	720	April 7, 2023
Issue with the KneadData bioBakery workflows	0	258	November 10, 2023
BioBakery 3 tutorial questions bioBakery workflows	7	2189	April 29, 2021
Error in installing biobakery_workflows_databases bioBakery workflows	67	2178	October 27, 2023

Alternate kneaddata genomes in biobakery workflows

Related topics