I am just wondering which format should be entered for the part of the ‘genes.pep’ file as input data (Does it need identifier? gene sequences? gene length? or any other things?).
I would appreciate if anyone could answer this question.
I Have access to a new version of KEGG and can’t seem to locate the input files for creating the id_mapping file in the command: $ humann2_humann1_kegg --ikoc humann1/data/koc --igenels humann1/data/genels --o legacy_kegg_idmapping.tsv
Does any body know which KEGG database files correspond to the ‘humann1/data/koc’ and ‘humann1/data/genels’ used here?
Hi All - It’s unfortunately hard for us to answer these questions as we don’t have access to a current KEGG license, and the official KEGG installation has likely evolved since the time of HUMAnN 1.0 (which is when these files were first built).
If you’re doing custom alignment in HUMAnN (to KEGG or otherwise) the goal is to have 1) a file with all your sequences whose headers appear in 2) the id-mapping file (along with columns for functional category and taxonomy). Historically “1” was a file called genes.pep in the KEGG installation, and the id-mapping could be built from other KEGG-supplied files.
That file appears to be a mapping from legacy KEGG sequence headers (as would appear in the genes.pep file) to their sequence length. You can inspect those files in the legacy HUMAnN repository (i.e. v1.0) here: