Upgrading to more recent Uniref databases for Humann3

Hi,
Are there any guidelines or things to consider if we want to update the uniref90 database to a more recent version when using humann3 in translated mode? Or should it run smoothly?

I would refer to the custom databases text if you want to build an updated UniRef database for use in translated search:

Off the top of my head, the main things we do with the default UniRef FASTA input is to reduce the sequence header to just the UniRef ID followed by “|” and the sequence’s nucleotide-equivalent length (for RPK normalization). Note that, because UniRef cluster representatives shift over time, the UniRef accessory files (e.g. mappings to other functional annotations) bundled with HUMAnN would likely lose coverage of the updated database.

1 Like

@franzosa , Hi, is it possible to get the uniref90 2019 sequence fasta file that is default with Humann3? Thanks!

I think you can export the sequences from the HUMAnN database using the diamond getseq command. We do not host the original file.

1 Like

Why is it necessary to have the name of Uniref 2019 when updating the Uniref database in HUMAnN3?

Hello,

I have a question regarding the updating of the Uniref database in HUMAnN3. I noticed that even when updating the Uniref database, it is necessary to have the name of “Uniref 2019”. This requirement can be a bit discouraging when trying to update the database, even though there is an option to modify the database “GitHub - biobakery/humann: HUMAnN 3.0 is the next generation of HUMAnN 1.0 (HMP Unified Metabolic Analysis Network).”.

I would like to know why this is necessary and if there are any alternative options. I would appreciate any insights on this matter.

Thank you for your time and attention.

Best regards,
Jérémy Tournayre

I’m afraid I don’t follow your question. There shouldn’t be any naming constraints on a custom protein database, though the database has to follow the formatting described at the link you posted in order to work with HUMAnN.

Are you getting an error message when trying to work with a custom database? If you can describe a bit more about what you’re trying to do we might be able to find a solution.

Hello,

I have downloaded the latest version of Uniref:
https://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref50/

Then, I’ve used the following command to create a Diamond database from the FASTA file:
diamond makedb --in uniref50.fasta -d uniref50.dmnd

When I ran Humann3 with this database using the command below:

humann 
--input test/humann-3.6.1/examples/demo.fastq.gz
--output test/uniref_2023_03_01 uniref_2023_03_01 
--taxonomic-profile
"test/metaphlan_Jan21/profiled_metagenome.txt" 
--metaphlan-options="--index mpa_vJan21_CHOCOPhlAnSGB_202103 
--bowtie2db mpa_vJan21_CHOCOPhlAnSGB_202103 "  
--nucleotide-database db/chocophlan   
--protein-database 'uniref_2023_03_01/diamond

I found that the “uniref50.dmnd” file in the “uniref_2023_03_01/diamond” directory was causing an error message:

CRITICAL ERROR: The directory provided for the translated database contains files ( uniref50.dmnd ) that are not of the expected version. Please install the latest version of the database: 201901b

However, renaming “uniref50.dmnd” to “uniref50_201901b_full.dmns” allowed Humann3 to work.

I’m asking if I can do that because the error message discouraging me to do that. Also, you said this here: Thoughts on custom humann3 reference databases - #5 by franzosa

So, can I use Humann3 with the latest version uniref database like I have done, can this cause any issues?

Best regards,
Jérémy Tournayre

There are two things going on here:

  1. The error is because HUMAnN doesn’t like having multiple (unrelated) database files mixed in the same folder. This is a holdover from trying to support translated search tools that split up a database into multiple files for serial search. (Note: DIAMOND does not require this.) If you make a new folder and put the new database in it you should be OK.

  2. The raw download of the latest UniRef50 FASTA file won’t be formatted for HUMAnN use. Please see the section about using custom protein databases in our manual. Even after formatting, note that the various annotation files HUMAnN uses (e.g. maps from UniRef50s to enzyme IDs) won’t cover UniRef50s that were added after we built those mapping files.

I’d suggest using the UniRef50 files we provide even though they’re a bit out of date at this point - they won’t require extra formatting and will “play nicely” with all of the other files in the HUMAnN installation.