Humann_renorm_table: sum>1

Hello,

I used the humann_renorm_table function using humann3.5 and noticed that for both “relab” and “cpm” units the sum of is >1 and >1million, respectively. Should this cause concern?
I will also note that I did included the special features (unmapped, unintegrated and ungrouped).

Example for path abundance normalization:

humann_renorm_table --input 44WMGS_pathabundance.tsv --output 44WMGS_pathabundance_relab.tsv --units relab --mode community --special y --update-snames

44WMGS_pathabundance_relab.tsv (503.5 KB)

Thank you

The community totals for a freshly renorm’ed sample should always sum to 1 (or 1e6 in CPM units). The same is true for stratified gene families and gene family-like groups that you computed using the regroup utility (KOs, ECs, etc.). However, stratified pathway abundances are not guaranteed to sum to 1. This is because the pathway totals are not a linear combination of the pathway stratifications. Hence, while we force the pathway totals to sum to 1, and while we normalize the pathway stratifications against the unnormalized sum of pathway totals, this does not force the pathway stratifications to sum to 1.

Hi,

I’m using HUManN 3.6 and when I use the humann_renorm_table to normalize the gene families data (Uniref90) using the CPM method the resulting sample sum is > 1 million.

I ran the function as follows:
humann_renorm_table --input sample_genefamilies.tsv --units cpm --output sample_cpm_genefamilies.tsv

The output file is different from the input file, but it doesn’t appear to be CPM normalized.

The files are too large to upload here, but I can share a link if you’d like.

Am I missing a step? Is this something you can help me with?

Thank you

The sum of each gene families column should be 2 million after normalizing - 1 million as summed over community totals and 1 million as summed over stratifications. If you’re getting an answer other than 2 million then something is off.

Hi,

Thank you for your response. The sample columns do not sum to 2 million either, but the values are kind of close to 2 million with an average difference of -120656.9

I misspoke above, apologies! The sum of the totals (i.e. ignoring stratified values) should be 1M. If you don’t have any special features in the file (e.g. UNMAPPED) then the sum of the stratifications should also be 1M. But if you do have special features in the file, then the stratified features will have the same sum as the non-stratified totals excluding special features. Here is an example using 10 instead of 1M as the target sum:

UNMAPPED           3 
feature1           4 
feature1|species1  2 
feature1|species2  2 
feature2           3 
feature2|species1  2 
feature2|species2  1

UNMAPPED + feature1 + feature2 = 10.
feature1 + feature2 = 7 (sum of unstratified totals)
feature1|* + feature2|* = 7 (sum of stratified features)