Custom database with multiple species

Hello. I have some metagenomics data generated from the roots of different plant species. I was planning to decontaminate my data against the human database, but I would like to also decontaminate against some plant genomes. Is it possible to have a custom database which has the human genome (i.e. the data from this command: kneaddata_database --download human_genome bowtie2 Documents/Kneaddata_Human) and a couple of plant species.

If I were to run the command “bowtie2-build Homo_sapiens.fasta -o Homo_sapiens_db”, would I just have additional fasta files for each species? For example: bowtie2-build Homo_sapiens.fasta plantspecies1.fasta plantspecies2.fasta plantspecies3.fasta -o Homo_sapiens_AndPlants_db

Or, would I need to concatenate everything together? For example: bowtie2-build Concatenated_Humans_Plants.fasta -o Homo_sapiens_AndPlants_db

Or, do I have to decontaminate sequentially? For example: Human database > Plant database > Cleaned data

Thanks for your help!


I just wanted to provide a quick update for others wishing to make a custom database. It turns out that I am dumb and the information was in the bowtie2 documentation. You can make a bowtie2 database with multiple species if you seperate them with commas. For example: Species1.fa,Species2.fa.

However, I did have a question about including the human database that kneaddata provides. I downloaded the “Homo_sapiens_hg37_and_human_contamination_Bowtie2.v0.1.tar.gz”. I tried to use this .gz file with bowtie2-build and it didn’t like it. I opened up the .gz file and it has bt2 files (I assume these are bowtie2 files). Can I just copy these bt2 files into the folder where I have made my custom Tree genome database so when I use kneaddata it will remove human + Tree genome data?

Or, is there a fasta file of the aforementioned human file that I can use?


Hi @wking ,
you can decontaminate using more than one genome you by providing the path to the directory containing the index. Make sure you have a seperate directory for each index (i-e one directory for the the six .bt2 files produced by using bowtie2-build on a single genome), then add the -db option once per reference genome, ex

kneaddata --input sample_R1.fastq.gz --input sample_R2.fastq.gz -db $DB1 -db $DB2

That worked for me using a human and a plant genome.