Hi there,
Very excited about HUMAnN3!
I hope you could help me understand one of the features updated: “Pangenome sequences must be covered at >50% of sites to be reported (tunable)”. Is it referring to [--translated-identity-threshold <Automatically: 50.0 or 80.0, Custom: 0.0-100.0>]
? Does it have any relation with [--identity-threshold <50.0>]
option in HUMAnN2? I guess what I am truly asking is if I let both run on default params, is one going to be more stringent than the other?
Thank you advance for your help!
Best,
Huan
We split out the identity and coverage-filter params in v3.0 to make it more clear which phase of the search they applied to (and to make them separately tunable). Translated search in v3.0 against UniRef90 is slightly more permissive (less conservative), with the identity threshold lowered from 90 to 80%.
This also tends to be more biologically realistic, as proteins in the same UniRef90 family won’t be 90% identical is all of their read-length windows; rather, they are 90% identical on average. Hence we expect some reads to align at <90% identity.