Inconsistent pathway stratification

Hi,

Could anyone please help me with how pathways are stratified? I noticed that for the pathways did not have any species assigned, in the pathabundance.tsv, it could either be only having the main entry, such as:

S1_pathabundance.tsv:PWY-5677: succinate fermentation to butanoate 114.7999908573

or, in some cases, it could have the main entry and the sub-entry “unclassified”, e.g.,

S2_pathabundance.tsv:ARGORNPROST-PWY 66.8159830865
S2_pathabundance.tsv:ARGORNPROST-PWY|unclassified 48.8755175661

What may have caused the difference? This is Humann 3.0.0.

2 Likes

If you only see a total for a pathway (i.e. no stratifications), it means that the pathway was satisfied at the community level but not by any single detected species. For example, maybe the pathway requires reactions A + B + C + D and species 1 contributes A + B and species 2 contributes C + D.

In the second example you’re seeing stratification to unclassified, which means that all of the required reactions were detected by translated search, potentially coming from different species. The total is larger than the unclassified stratification because there are additional incomplete pathways being satisfied at the community level.

Returning to my example above, if we also saw one copy of A + B + C + D in the unclassified mapping, then there would be two complete copies at the community (total) level: the one from unclassified and another one from the combination of Species 1 and Species 2.

1 Like

Hi Eric,

Thank you so much for your answers above! Could you please also help me with a related question?

I am using humann_barplot function to visualize the contribution of species to community level pathways. I have a few questions below:

  1. What does the height of each stacked bar represent? Is it the sum of the all stratified species-level pathway abundance (i.e., ignoring the unstratified part – the portion only complete at the community level and contributed by separate species) or it is the community-level abundance?

  2. If it is the first one, will the pathways cannot be stratified to any single species equal to zero in the barplot (even if it is high at the community level)? If it is the second one above, how is the unstratified portion represented?

  3. Could you please provide an example of logstack scaling? It is explained as: " scaling the tops of the stacked bars in proportion to the log of community abundance, while scaling species contributions linearly within total bar height" in the tutorial. I am not sure if I understand the first part – does it mean that we divide the sum of all stratified portion to the community-level abundance and take log10?

Thank you so much!

Best,
Yue

The value represented by the top of the stacked bars is the sum of the stratified contributions. Any excess abundance from community synergy (i.e. from cases where community abundance > sum of stratified abundances) is not represented in the plot. Hence a pathway that is only quantified at the community level in a particular sample would not be represented on the plot.

I have been thinking about adding another feature to the plot corresponding to the excess community abundance (i.e. community abundance - sum of stratified abundances) to adjust for this, but have not implemented it yet. If there is interest in that feature it would be extra motivation to implement it. :slight_smile:

The log stack scaling draws the stacked bars such that the top of the stack corresponds to the log-scaled sum of stratifications. Then it scales the individual bars in between the arbitrary minimum of the y-axis and the top of the stack so they are proportional to the total height. The heights of the bars and their position relative to the y-axis are not meaningful (which is why working with log-scaled bar plots is generally not recommended, but it was the best way I could think of to convey both the stratifications and the totals when the latter were highly variable).

Here’s an example. I have a log-scaled y-axis with ticks at 0.001, 0.01, 0.1, and 1.0. I have a function with two stratifications S1 = 0.05 and S2 = 0.05 that sum to 0.1. This would be represented as a stack of bars starting at 0.001 (y-axis min) and ending at 0.1 (the sum of the stratified abundances). S1 would be represented by a bar spanning half of this height - i.e. from y = 0.001 to 0.01 - and S2 would be represented by a bar spanning the rest of the height - i.e. from y = 0.01 to 0.1. Note, however, that those y-axis spans (e.g. S1’s from y = 0.001 to 0.01) have nothing to do with the actual sizes of the stratified abundances: it’s just the geometry needed to make them appear as correct proportions of the total height.

1 Like

Hi Eric,

Thank you so much for your explanation and examples! It is very clear! And yes, I would suggest including the synergy part in the barplot (maybe color it by grey and set it to the bottom of the stacked bars), as it can help visualizing 1) the total community pathway abundance across samples/groups, 2) the proportion of community-level pathway abundance attributable to stratified species and unstratified synergy.

Many thanks!

Gary