I am in the process of trying to make a custom humann3 nucleotide database. I am at the step of generating a bowtie2 index from a fasta file. I am wondering if having the full taxonomy in the fasta headers will lead to any issues (from kingdom to species and everything in between). I have only seen examples showing just |genus|species.
HUMAnN looks for the taxonomy in one of the pipe-delimited fields of the header (which you can specify). The main software doesn’t make any assumptions about the taxonomy only being G+S. That is not true of some of the utility scripts, however, which will assume that your stratified output has exactly G+S taxonomy (or none, in the case of unclassified). The plotting utility and infer_taxonomy utility come to mind.