Issue on metacyc pathway coverage

jxu4 · April 29, 2021, 3:59pm

Dear bioBakery help forum,

Thank you for the wonderful tools.

I am working on analysis on Humann3 output on metacyc pathway coverage. I encountered some issue on the coverage. This is the coverage of pathway: HISTSYN-PWY (L-histidine biosynthesis). Please see the table below. The table shows that coverage of this pathway and the reads of corresponding EC gene families. I am confused on the difference between different samples. For example, the sample 3A1493 has the fewer reads of each gene families than those of sample2A1086. However, the coverage of 3A1493 is 1 and the coverage of 2A1086 is 0.95.

Could you please give some instructions on how to calculate the coverage?

Thanks,

Jun

franzosa · April 29, 2021, 4:28pm

You can read more about pathway coverage here (or in the HUMAnN 1 paper):

The short answer is that coverage is based on where reactions fall within the abundance distribution of their sample, and hence a pathway X could be covered more highly in a sample with lower read depth if the X reactions were proportionally more abundant there.

That said, the coverage calculations are mostly a holdover from HUMAnN 1 (where they were more critical) and are a target for update/replacement in the future. I would focus on the pathway abundances instead; if a pathway has abundance > 0 you can consider that it was fairly confidently covered/detected by modern HUMAnN.

jxu4 · May 4, 2021, 6:09pm

Thank you so much for your reply.

Jun

meenachakra · July 31, 2021, 5:13am

Thank you so much. This is very helpful. We are trying to set thresholds in our analysis - would you say the same for gene family (abundance > 0 → confidently covered/detected)? Thank you!

franzosa · August 2, 2021, 4:39pm

Yes, but for different reasons and with some caveats. A pathway being non-zero means that many reactions had to be non-zero (an abundance of evidence). The equivalent we use at the single-gene level is requiring >50% of sites to have been hit by a read (i.e. a single read hitting a gene is not sufficient to call the gene as present). This increases specificity relative to the “any hit = detected” approach, but it’s not perfect. For example, there are some gene families that are very hard to tell apart on the basis of short-read recruitment (e.g. an A-domain family + B-domain family vs. an AB-domain family).

meenachakra · August 2, 2021, 10:24pm

Thank you so much! This is very helpful.

Topic		Replies	Views
Genefamilies, PathAbundance, PathCoverage HUMAnN	2	1566	September 27, 2022
Pathway coverage issue from humann2 HUMAnN	1	601	December 28, 2021
HUMAnN pathway coverage: community total ≪ species coverage. Why can this happen? HUMAnN	2	131	August 27, 2025
What is about the new pathcoverage HUMAnN	3	369	January 24, 2023
Pathway abundance and cross-samples comparison HUMAnN	1	906	June 3, 2021

Issue on metacyc pathway coverage

Related topics