I am unable to find an example file for how the input for a gene table should look like as an input for HUMAnN.
Is there an example file somewhere?
Can I use output from mapped from GMGC ?
Thanks!
Harithaa
I am unable to find an example file for how the input for a gene table should look like as an input for HUMAnN.
Is there an example file somewhere?
Can I use output from mapped from GMGC ?
Thanks!
Harithaa
Hi,
The command is
humann -i genetable.tsv --input-format genetable -o functions --diamond ../../
A gene table looks like this,
# Gene Family E100002696_L01_1_kneaddata_Abundance-RPKs
UNMAPPED 4887640.0000000000
UniRef50_A0A125UBK5 2135.6903965600
UniRef50_A0A125UBK5|g__Escherichia.s__Escherichia_coli 1652.1739130435
UniRef50_A0A125UBK5|g__Enterococcus.s__Enterococcus_faecalis 483.5164835165
UniRef50_A0A1D7PV13 2135.6122163071
UniRef50_A0A1D7PV13|g__Escherichia.s__Escherichia_coli 2135.6122163071
UniRef50_F4NR79 1762.3762376238
UniRef50_F4NR79|g__Escherichia.s__Escherichia_coli 1762.3762376238
UniRef50_Q54HB2 1585.8948562197
UniRef50_Q54HB2|g__Neisseria.s__Neisseria_sp_oral_taxon_014 464.2864737279
UniRef50_Q54HB2|g__Aggregatibacter.s__Aggregatibacter_segnis 436.5079365079
UniRef50_Q54HB2|g__Neisseria.s__Neisseria_macacae 354.8387096774
UniRef50_Q54HB2|g__Neisseria.s__Neisseria_subflava 124.4813278008
UniRef50_Q54HB2|g__Escherichia.s__Escherichia_coli 105.9552692206
UniRef50_Q54HB2|g__Neisseria.s__Neisseria_canis 66.6666666667
UniRef50_Q54HB2|g__Eikenella.s__Eikenella_corrodens 23.5191637631
Hi! I am also trying to use HUMAnN on the gene table and in need of directions. Unfortunately, the exact format of required TSV file is not well documented in the repository.
I tried to use a snippet of a file you provided above @w_ceasea and yet I get no results matched:
# Pathway gene_table_Abundance
UNMAPPED 0.0000000000
UNINTEGRATED 0.0000000000
Is it only a coincidental matter of the proteins subset not being annotated in the database, or does HUMAnN in fact require another kind of input table? It is quite confusing to start with, as the description claims gene table
, yet the above shows UniRef50
identifiers which refer to proteins.
If I have a collection of UniProtKB identifiers, what would be the best way to go about translating them to UniRef IDs? Can UniRef90 also be used?
Any help will be greatly appreciated, as we do not have sequencing reads at hand and can only use CDS/proteins information.