Normalization with lots of unmapped humann reads

immunochem · November 28, 2023, 4:18pm

packageVersion(“Maaslin2”)
[1] ‘1.14.1’

Hello, I have sort of a theoretical question regarding normalization of my data. Im running humann3 with a custom database containing protein sequences from only a few key enzymes of interest. As a result, the vast majority of the reads present in the metagenomic samples remain unmapped, which is fine for my purposes. Typically I would re-normalize the RPKs to relative abundance and run them in maaslin2, however, as the vast majority of the reads are in the unmapped (>0.99 relab for most samples) I have been leaving the humann results in RPK, dropping the unmapped row and then doing CLR normalization before maaslin2 analysis. Wondering if anyone has some input as to whether this is sound statistically, or if there is something I’m not considering. Thank you for your help!

franzosa · November 28, 2023, 4:49pm

If you want to talk about the relative composition among the small number of enzymes of interest across samples, then ignoring the unmapped fraction and computing relative abundance / CPMs / CLR values over the enzymes’ RPKs should all get the job done.

If you’re interested at all in the total abundance of these enzymes (as a fraction of the total sequencing depth), then you lose that signal with the above approaches. If you want to retain this signal, which is often the case when working with functions of interest rather than broad functional profiles, some approaches I’ve used in the past are 1) computing RPKMs using the total reads (not the mapped reads) for the M in the RPKM calculation or 2) doing some flavor of genome-size normalization (i.e. comparing the enzymes’ RPKs to RPKs of housekeeping genes). MUSiCC and MicrobeCensus are options for #2.

Here’s an example of work where we adjusted the abundances of some enzymes of interest according to the average genome size of HMP samples:

Topic		Replies	Views
High unmapped abundance in Humann3 HUMAnN	4	964	April 20, 2023
Differential abundance testing after humann3 HUMAnN	1	1799	September 30, 2021
Percentage mapped reads in RPK or raw read count? HUMAnN	1	662	March 1, 2021
Humann3 calculation of RPK values HUMAnN	1	327	June 11, 2023
High value of UNINTEGRATED reads HUMAnN	7	1147	October 6, 2023

Normalization with lots of unmapped humann reads

Related topics