Genefamilies, PathAbundance, PathCoverage

Hi,
Based on the Humann3 manual description about the three main output file of Humann3, can i infer following things:

  1. The Abundance in GeneFamiliies indicate the number of times a gene is found?
  2. The Abundance in PathAbundance indicates the number of times a Pathway is found?
  3. The coverage between 1 to 0 indicates, whether a complete or partial Pathway detected? Its clear in the manual that
    " A pathway with coverage = 1 is considered to be confidently detected (independent of its abundance), as this implies that all of its member reactions were also confidently detected. A pathway with coverage = 0 is considered to less confidently detected (independent of its abundance), as this implies that some of its member reactions were not confidently detected."

Then what does the number like 0.85 or 0.50 etc means in PathCoverage? Does it mean 85% or 50% of the member reactions of a pathway were confidently detected?

Best wishes,
Sumeet

1 Like

Re: 1 and 2 - Very close. The coverage (RPK) values you get for genes and pathway abundances are designed to be proportional to the number of copies of those genes/pathways in the sample, such that they are helpful for comparison purposes within and across samples. It is usually not possible to determine actual counts with normal shotgun sequencing (you would need to add some independent measurement, e.g. qPCR, to get that sort of quantitation).

Re: 3 - The coverage scores are mostly a holdover from the original HUMAnN (v1) where we were less confident about individual reaction abundances. They indicate (roughly) a probability that all the reactions needed to quantify a given pathway are real (as opposed to false positives). Intermediate coverage values suggest that some of the reactions were very confidently detected (>median abundance) while others were less confidently detected (<median abundance). That said, we’re a lot more confident in low-abundance reactions reported by modern HUMAnN, so these scores are probably very conservative. (We don’t use them much internally, and I suspect we’ll eventually drop them / replace them with a more useful new output.)

2 Likes

Thanks for the quick reply. That was really helpful.
That means the PathCoverage is not an indictor of complete or fraction of a pathway detected within a sample.

1 Like