Hi,
I am using HUMAnN v3.9 and have regrouped gene families to reactions using regroup_table. From my understanding, the output is in RPK units (Reads Per Kilobase).
I would like to perform differential abundance analysis with tools such as DESeq2 or edgeR, but these methods require raw count data, not normalized units like RPK.
-
Is there a way to obtain count data from HUMAnN v3.9, or convert RPK values back to counts?
-
Do you have recommendations for approaches to use with HUMAnN v3.9 output?
-
I also noticed that HUMAnN v4 (alpha) can produce count data—would you suggest using that instead if I want to apply DESeq2/edgeR?
Thanks for your help!
I think if you search the forum you’ll find some discussion on methods to back-calculate approximate counts from other forms of abundance data, but I wouldn’t really recommend doing that. Methods that want counts want TRUE counts, so you’re potentially misusing them (or at least not using them to their full potential) by approximating something that looks like a count.
We typically work with relative abundance units (e.g. RPKs sum-normalized to CPMs) and then analyzing them using linear models in MaAsLin as opposed to working with count-based models.
HUMAnN 4 can output raw counts using a non-default normalization mode. We added this feature because it’s something that gets requested all the time, but it’s not a feature we use internally.
1 Like