Humann3 humann_renorm_table output downstream analysis

Ivy-ops · October 10, 2024, 6:53pm

Hi Developer,

I have used humann_renorm_table to generate CPM normalized data. I would like to apply a distance measure to address two aspects: (1) assess beta diversity between groups and (2) identify differential pathways.
From what I understand, CPM represents count-per-million reads normalized data.
Could you advise on which distance metric would be most appropriate for downstream analysis, particularly for calculating distances on the CPM table and for conducting differential testing?

Thank you!

franzosa · October 11, 2024, 7:21pm

We favor the Bray-Curtis distance for this sort of data (and for microbiome beta diversity questions in general). Note that it will be cleaner to do your distance calculations on the community totals and not the (combined) stratified data.

Ivy-ops · October 14, 2024, 7:19pm

Thank you, @franzosa. By “community totals,” are you referring to the combined pathways CPM table (the smaller matrix), rather than the larger matrix with individual components? For instance, pathway A would be the sum of A1, A2, A3, …, A10, and pathway B would be the sum of B1, B2, B3, …, B20, based on the HUMAnN3 output. So, when you mention community totals, do you mean the table containing pathway A and pathway B, rather than the table listing A1, A2, …, A10 and B1, B2, …, B20?

Also, which transformation would you recommend for differential pathway analysis?

Much appreciated!

franzosa · October 15, 2024, 3:36pm

I mean that when computing distances, you would want to work with the values like this:

PWY1   12.1
PWY2   25.2

And not the version with the stratifications included:

PWY1   12.1
PWY1|speciesA   10.1
PWY1|speciesB   2.0
PWY2   25.2
PWY2|speciesA   12.1
PWY2|speciesB   13.2

Or you could use just the stratified rows (i.e. the ones with | in them), but you don’t want to mix them since they represent separate compositions over the data.

We typically use a log transformation for microbiome features. If you want to include 0s in the modeling, then we replace them with half the smallest non-zero value (on a per-feature basis) before taking the log. FYI in MaAsLin v3 we are moving toward modeling abundance on only the non-zero values (with a log transform) and separate logistic modeling of the zeros (presence/absence).

Ivy-ops · October 15, 2024, 6:23pm

@franzosa Thank you so much!
I have another question. I noticed that the column sums for each sample in my CPM table are 1,000,000. When I divide my CPM values by 1,000,000, the resulting table closely resembles a relative abundance (TSS-normalized) table, column sum is 1 for each sample. However, based on the humann_renorm_table --units cpm tutorial, it states that the output isn’t TSS-normalized. Can I consider the CPM/1,000,000 table equals to proportion table? Thank you!

franzosa · October 15, 2024, 7:47pm

Yes, we tend to avoid those units because they are so tiny, but they are equivalent. If you want your units to sum to 1.0 you can also use the renorm script in “relab” (relative abudance) mode. Note that both of these are TSS: they just use different total sums (1 vs. 1e6).

Ivy-ops · October 17, 2024, 5:46pm

@franzosa Thank you!

Topic		Replies	Views
Confusion with HUMAnN 'regroup_table' and higher-level pathway information HUMAnN	1	1165	February 2, 2024
Humann_renorm_table: sum>1 HUMAnN	5	899	September 22, 2023
Differential abundance testing after humann3 HUMAnN	1	1775	September 30, 2021
Deseq2 analysis of Humann3 outputs HUMAnN	3	850	January 3, 2024
Differential pathway abundance analysis by Masslin3 MaAsLin	1	47	April 10, 2025

Humann3 humann_renorm_table output downstream analysis

Related topics