Custom databases usage


I’ve created custom no uniref-based protein database with arbitrary sequence headers. How can i link it to custom pathway database?

For example:

protein db:


Pathway db, file 1:
Rxn-1 ref_prot1
Rxn-2 ref_prot2

Pathway db, file 2:
PWS-TEST Rxn-1 Rxn-2
(in accordance with GitHub - biobakery/humann: HUMAnN 3.0 is the next generation of HUMAnN 1.0 (HMP Unified Metabolic Analysis Network).)

Diamond generates output tsv file with alignment against custom db but humann reports that total gene families count is zero. What am i doing wrong?

Are you testing on a very small metagenome? It’s possible that none of your custom proteins are being covered at 50% of sites, in which case HUMAnN will conservatively not report them. You can lower this threshold on the command-line (translated subject coverage threshold) such that any valid read-hit will add its protein to the output file.

I’ve created nucleotide dataset from 11321 sequences, where each sequence corresponds to some protein. Than i’ve translated them into aminoacid form and created diamond database from gained file.
Then i’ve launched Humann with source fasta file as input data.
The file diamond_aligned.tsv which stores aligning against diamond database shows me that each nucleotide sequence was aligned agaisnt itself in amino form(just as expected). So, each protein in database is full covered by 1 read.