Is the renorm step affected by what samples are included together

Hi developer,

Just wanna confirm that, the CPM is calculated as the count in RPK that is output by humann2 divided by the library size of that sample and then times a million right?

So it doesn’t matter which samples are included when you are doing the renorm?

Or if I’m removing some samples after renorm, do I need to do the renorm again?

Thank you very much!

Angel

Correct that CPM calculation is conducted per-sample (i.e. the properties of sample X do not affect the normalization of sample Y). For gene families especially, it can be more computationally efficient to normalize the samples before merging into a single table due to the sparsity induced by the merging process (but the end results will be the same).

The actual calculation is to divide each RPK measurement by the sum of community-level features’ RPK measurements (for relative abundance units) and then multiply by 1 million (for CPM units).

2 Likes

Thank you for your clarification