Should I first renorm gene_family to CPM and then regroup to pathway or on the contrary?

When analysis the output humann3, should I first renorm the gene_family to cpm and then regroup to pathway, or first regroup to pathway and then renorm to cpm value?
In the humann3 manual (, the manual suggest that it should first regroup to other functional category and then renorm to cpm.

But in humann3 tutorial (, the tutorial suggest that it should first renorm to cpm and then regroup to other functional category.

These end up producing similar results. The pathway abundance that HUMAnN computes for you is based on the unnormalized gene families, such that both the gene families and pathways are in units of RPKs, and these can be normalized to CPMs or relative abundance to adjust for sequencing depth.

I have seen arguments that, philosophically, it may be better to normalize once for sequencing depth “as early as possible” in a pipeline (for HUMAnN, that would be the gene family level). This can produce some surprising results if a gene contributes to more than one broader function, however. For example, if I normalize my gene abundance to sum to 100%, and then I sum genes according to their Pfam domain membership, the total Pfam abundance would exceed 100% (because the average gene contains >1 Pfam domain). These Pfam values would still be safe to analyze - they have been adjusted for sequencing depth at least once - but the fact that their totals vary across samples can look a little strange.

I have tried both methods and it produce difference results, so it puzzles me a lot.
Thanks for your reply! It’s really helpful.