Hello,
I am having troubles understanding what the command line arguments for the utility script humann_infer_taxonomy
do.
I have read through the code of infer_taxonomy.py and the humann_infer_taxonomy documentation, but I do not get what the --mode {totals,unclassified,stratified}
and the --lca-choice {source_tax,uniref_lca,humann_lca}
control. What do the different options do?
And I also wonder, if there is a way to preserve existing species level taxonomic information from the pangenome search for a certain gene family, in both cases, so when unclassified is replaced by some level (e.g. family) and also when unclassified cannot be replaced?
In other words, is it possible to not modify features of known genus/species to match target level, but just to re-assign unclassified taxonomic gene families based on results from translated search – if that makes sense?
Here are the available command line options:
(humann3.6_metaphlan4_py3.9) bernhard@macbook ~ % humann_infer_taxonomy -h
usage: humann_infer_taxonomy [-h] -i INPUT [-o OUTPUT] [-l {Kingdom,Phylum,Class,Order,Family,Genus}] [-d {uniref50-tol-lca,uniref90-tol-lca}] [-m {totals,unclassified,stratified}]
[-c {source_tax,uniref_lca,humann_lca}] [-t THRESHOLD] [--devdb DEVDB]
HUMAnN utility for inferring "unclassified" taxonomy
=====================================================
Based on the lowest common ancestor (LCA) annotation
of each UniRef50/90 cluster, infer approximate taxonomy
for unclassified features at a target level of resolution.
Will modify features of known genus/species to match
target level.
optional arguments:
-h, --help show this help message and exit
-i INPUT, --input INPUT
HUMAnN genefamilies table
-o OUTPUT, --output OUTPUT
Destination for modified table; default=STDOUT
-l {Kingdom,Phylum,Class,Order,Family,Genus}, --level {Kingdom,Phylum,Class,Order,Family,Genus}
Desired level for taxonomic estimation/summation; default=Family
-d {uniref50-tol-lca,uniref90-tol-lca}, --database {uniref50-tol-lca,uniref90-tol-lca}
UniRef-specific taxonomy database
-m {totals,unclassified,stratified}, --mode {totals,unclassified,stratified}
Which rows to include in the estimation/summation; default=totals
-c {source_tax,uniref_lca,humann_lca}, --lca-choice {source_tax,uniref_lca,humann_lca}
Which per-gene taxonomic annotation to consider; default=humann_lca
-t THRESHOLD, --threshold THRESHOLD
Minimum frequency for a new taxon to be included; default=1e-3
--devdb DEVDB Manually specify a development database
I am using humann v3.6 right now.
Best regards,
Bernhard