Custom database with multiple species

wking · February 17, 2022, 6:31pm

Hello. I have some metagenomics data generated from the roots of different plant species. I was planning to decontaminate my data against the human database, but I would like to also decontaminate against some plant genomes. Is it possible to have a custom database which has the human genome (i.e. the data from this command: kneaddata_database --download human_genome bowtie2 Documents/Kneaddata_Human) and a couple of plant species.

If I were to run the command “bowtie2-build Homo_sapiens.fasta -o Homo_sapiens_db”, would I just have additional fasta files for each species? For example: bowtie2-build Homo_sapiens.fasta plantspecies1.fasta plantspecies2.fasta plantspecies3.fasta -o Homo_sapiens_AndPlants_db

Or, would I need to concatenate everything together? For example: bowtie2-build Concatenated_Humans_Plants.fasta -o Homo_sapiens_AndPlants_db

Or, do I have to decontaminate sequentially? For example: Human database > Plant database > Cleaned data

Thanks for your help!

Regards,
William

wking · March 2, 2022, 10:09pm

Hello.
I just wanted to provide a quick update for others wishing to make a custom database. It turns out that I am dumb and the information was in the bowtie2 documentation. You can make a bowtie2 database with multiple species if you seperate them with commas. For example: Species1.fa,Species2.fa.

However, I did have a question about including the human database that kneaddata provides. I downloaded the “Homo_sapiens_hg37_and_human_contamination_Bowtie2.v0.1.tar.gz”. I tried to use this .gz file with bowtie2-build and it didn’t like it. I opened up the .gz file and it has bt2 files (I assume these are bowtie2 files). Can I just copy these bt2 files into the folder where I have made my custom Tree genome database so when I use kneaddata it will remove human + Tree genome data?

Or, is there a fasta file of the aforementioned human file that I can use?

Thanks!

jorondo1 · March 31, 2022, 2:44pm

Hi @wking ,
you can decontaminate using more than one genome you by providing the path to the directory containing the index. Make sure you have a seperate directory for each index (i-e one directory for the the six .bt2 files produced by using bowtie2-build on a single genome), then add the -db option once per reference genome, ex

kneaddata --input sample_R1.fastq.gz --input sample_R2.fastq.gz -db $DB1 -db $DB2

That worked for me using a human and a plant genome.

shriram_patel · November 4, 2022, 11:06am

Hi,

In case you still want to prepare custom combined database from multiple reference genomes, you can use bowtie2-inspect to convert bt2 indexes to fasta sequences and then use it for building combined index.

Hope this helps,
Shriram

Topic		Replies	Views
Install instructions KneadData	1	356	September 24, 2021
Bowtie2 mode for Host decontamination of metagenomic samples using proxy species KneadData	0	369	March 23, 2022
Gene sequences fasta files for bowtie2 and diamond index HUMAnN	4	44	March 7, 2025
Alternate kneaddata genomes in biobakery workflows bioBakery workflows	7	1775	November 24, 2020
KneadData for dual-transcriptome RNA-seq data KneadData	1	654	June 3, 2021

Custom database with multiple species

Related topics