Hi, Im trying to understand what happens to a read gets mapped to more than one database entry in the translated search. In the HUMAnN2 paper I found this: “If a read has two or more high-quality alignments to distinct database sequences, the read’s single count is divided across the corresponding sequences in proportion to squared alignment identity”.
Can you please clarify what this means?
Also, as far as I can understand, this was not changed in HuMAnN3 - could you please varify this?
Thanks in advance,
For example, if a read has exactly two equally good hits, then we assign 0.5 reads to the first hit and 0.5 reads to the second hit. If the hits are not equally good, then we use squared alignment identity to weight how the read is divided up. This logic has not changed in HUMAnN 3.
What we did change in HUMAnN 3 was the number of “good hits” considered by default for each read. In HUMAnN 2 we considered all of the default hits returned by DIAMOND, which I believe maxed out with the top 20. In HUMAnN 3 we only ask DIAMOND to return hits that are within a certain % score of the best hit. This improved performance without adversely impacting accuracy, and in practice makes the weighting of hits more similar to my first example (i.e. divided up over a set of approximately equivalently good targets).