i run humann3 with ‘–search-mode uniref50’ options. but after uniref50 searching, it search uniref90 database also. does it mean i have to remove uniref90 database from my protein database path?
I recommend placing each of those databases in its own separate folder.
During the translated search phase, HUMAnN will by default map reads again any valid database it finds in the translated search database folder (and then integrate the results downstream). Historically, some applications of HUMAnN required splitting up a large protein database into multiple pieces to be searched sequentially, and we have continued to support that functionality in the current version.
Thank you for your reply.
So based on this design, ‘–search-mode’ option is not a ‘strong’ option to the protein database setting/selection?
Which Uniref50 and Uniref90 sequence file do HUMAnN use to build diamond Uniref databases version ‘201901b’? And if I want to build a splitted protein database, can I just split the Uniref sequence file into severalls and use diamond ‘makedb’ function to build my own databases to help accelerate HUMAnN analysis? Also which diamond ‘makedb’ options used when building the ‘201901b’ database?
Thanks again for your help.
--search-modeis a convenience flag for tuning a set of more detailed flags (e.g. percent identity) for UniRef90- vs UniRef50-like mapping. You can map in UniRef50 mode against a non-UniRef50 database.
The 201901b protein databases have the same sequence inputs as 201901 (i.e. UniProt/UniRef release 201901), but they’ve been re-indexed for use with the last DIAMOND in the pre-2.0 lineage.
Correct, you could split a set of database sequences into N separate files, index each of them separately, and then provide them as a folder of databases for serial mapping. That said, this is mostly a legacy option to make pre-DIAMOND blastx alternatives work in lower-memory environments; it will not increase speed. Because DIAMOND allows you to tune your memory use on a constant database, you should not need to use this option.
We used the default options for
Thanks a lot!
BWT, it is better to split the query sequences into several parts to run the DIAMOND search on different cluster nodes if I want to accelerate the DIAMOND search?