Alignment post-processing

AyaB · April 8, 2022, 6:02pm

Hi, Im trying to understand what happens to a read gets mapped to more than one database entry in the translated search. In the HUMAnN2 paper I found this: “If a read has two or more high-quality alignments to distinct database sequences, the read’s single count is divided across the corresponding sequences in proportion to squared alignment identity”.
Can you please clarify what this means?
Also, as far as I can understand, this was not changed in HuMAnN3 - could you please varify this?

Thanks in advance,

Aya

franzosa · June 29, 2022, 8:59pm

For example, if a read has exactly two equally good hits, then we assign 0.5 reads to the first hit and 0.5 reads to the second hit. If the hits are not equally good, then we use squared alignment identity to weight how the read is divided up. This logic has not changed in HUMAnN 3.

What we did change in HUMAnN 3 was the number of “good hits” considered by default for each read. In HUMAnN 2 we considered all of the default hits returned by DIAMOND, which I believe maxed out with the top 20. In HUMAnN 3 we only ask DIAMOND to return hits that are within a certain % score of the best hit. This improved performance without adversely impacting accuracy, and in practice makes the weighting of hits more similar to my first example (i.e. divided up over a set of approximately equivalently good targets).

Topic		Replies	Views
Humann3 "Unaligned reads after..." HUMAnN	0	560	November 16, 2020
Humann nucleotide alignment HUMAnN	9	154	June 24, 2024
Invalid quantification from concatenating paired reads? HUMAnN	5	598	November 24, 2020
Getting 67% unaligned reads with HUMANnN 3.0 HUMAnN	9	2230	June 28, 2022
Humann3: Run translated search HUMAnN	1	426	June 28, 2022

Alignment post-processing

Related topics