Problem creating a custom DB with KEGG

beeswaxag · May 3, 2020, 1:43pm

Hi,

I’ve recently subscribed the KEGG database.
However, I am having trouble creating a humann2 custom database using it.

Especially, during following below part:

$ humann2_build_custom_database --input genes.pep --output custom_database --id-mapping legacy_kegg_idmapping.tsv --format diamond --taxonomic-profile max_taxonomic_profile.tsv

I am just wondering which format should be entered for the part of the ‘genes.pep’ file as input data (Does it need identifier? gene sequences? gene length? or any other things?).
I would appreciate if anyone could answer this question.
Thanks.

franzosa · May 8, 2020, 2:12pm

genes.pep should be a FASTA file whose sequence headers (i.e. the strings that appear after the > that begin sequence entries) appear in your legacy_kegg_idmapping.tsv file.

beeswaxag · May 11, 2020, 7:42am

Thank you very much, Eric.

Kim

fconstancias · June 9, 2020, 9:30am

Hi @beeswaxag,

Could you give some details regarding how you generated the genes.pep as well as id_mapping from an updated KEGG database?

I also have acess to an updated version and would like to use it.

Thanks.

Florentin

ZTNolan · July 30, 2020, 12:55pm

I am having the same issue. Could you please expand on your response. Do I need to generate the FASTA file myself? If so, I am still not clear on what the content of the genes.pep file should be.

AyaB · July 30, 2020, 5:09pm

Hi everyone,

I Have access to a new version of KEGG and can’t seem to locate the input files for creating the id_mapping file in the command:
$ humann2_humann1_kegg --ikoc humann1/data/koc --igenels humann1/data/genels --o legacy_kegg_idmapping.tsv
Does any body know which KEGG database files correspond to the ‘humann1/data/koc’ and ‘humann1/data/genels’ used here?

Thanks

franzosa · August 3, 2020, 9:30pm

Hi All - It’s unfortunately hard for us to answer these questions as we don’t have access to a current KEGG license, and the official KEGG installation has likely evolved since the time of HUMAnN 1.0 (which is when these files were first built).

If you’re doing custom alignment in HUMAnN (to KEGG or otherwise) the goal is to have 1) a file with all your sequences whose headers appear in 2) the id-mapping file (along with columns for functional category and taxonomy). Historically “1” was a file called genes.pep in the KEGG installation, and the id-mapping could be built from other KEGG-supplied files.

AyaB · August 3, 2020, 9:52pm

Thanks!
I fully understand what you wrote but then what is ‘humann1/data/genels’ used for? Can you specify the headers for the two columns that appear in the file?

Thanks again,

Aya

franzosa · August 3, 2020, 10:10pm

That file appears to be a mapping from legacy KEGG sequence headers (as would appear in the genes.pep file) to their sequence length. You can inspect those files in the legacy HUMAnN repository (i.e. v1.0) here:

Topic		Replies	Views
Id_mapping_file for licensed KEGG HUMAnN	1	553	August 3, 2020
Using KEGG data base in HUMAnN2 HUMAnN	0	369	April 2, 2020
Gene sequences fasta files for bowtie2 and diamond index HUMAnN	4	44	March 7, 2025
Custom databases usage HUMAnN	4	654	October 26, 2023
Kegg on Humann2 HUMAnN	15	5669	April 19, 2021

Problem creating a custom DB with KEGG

Related topics