HUMAnN v4.0.0.alpha.1 Raw Counts

adam.mearig · May 19, 2026, 3:18pm

Hello,

I’m currently looking at the output of HUMAnN v4.0.0.alpha.1 run with the --count-normalization Counts flag so I could run the output with DESeq2. When I inspected the genefamilies, I noticed out of ~250,000 values, around 6,000 were decimals. Is this expected from the Counts output? If so, what is the best way to handle these for a counts-based program?
Similarly, the metaphlan_profile did not contain counts, only estiamted_number_of_reads_from_the_clade. Does this act as an appropriate stand in? Any thoughts appreciated.

Gsmith535 · May 20, 2026, 7:37pm

Not answering your question, but when using counts for reference sequences of various lengths (e.g., different genes from same taxon, same gene from different taxon, both of which occur often in humann’s databases), consider trying to account for the sequence length variation. Does 10 reads aligned to a 1000 bp gene mean the same thing as 10 reads aligned to a 5000 bp gene?

It may be that the devs did consider that a “count” was reads normalized by gene length (N reads / X gene length). In a sense, this would be comparable to, say 16S sequencing, where the reference sequence length is (relatively) uniform, and so there needs no normalization for read counts to be comparable. It may be that these could serve for DESeq2’s “counts”.

Based on what I understand of HUMANn3, the abundance quantification is complex (for example, note the name “gene families” instead of more simply “gene”), and so it may not always use read counts directly. Can you tell if the decimals are among certain strata, or primarily in the unclassified fractions, for which there is a different algorithm for read alignment (see humann2 docs and Have the basic work flow changed between HuManN version 2 and 3? )?

Note that DESeq2 was developed ( Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 | Genome Biology | Springer Nature Link ) for differential expression (gene in conditon 1 vs same gene in condition 2) of RNASeq (a transcriptome of a single organism) of humans (ie homo sapiens cells not microbes). When comparing read counts of a gene only to itself, length doesn’t matter. DESeq2 appears to take gene length into account only conditionally (Analyzing RNA-seq data with DESeq2), though their documentation about this subject more generally is rather unclear and focused on the same gene with distinct isoforms (mostly specific to Euks).
I would not consider myself an RNASeq or transcriptomics expert, but this, among other reasons, has always been a weakness of DESeq2 in my opinion, particularly for anything microbiome.

franzosa · May 28, 2026, 4:37pm

The default abundance units in HUMAnN 4 are CPMs, which are read counts normalized to gene length and sample depth. Previously HUMAnN reported RPKs, which are read counts normalized to gene length but NOT sample depth. If you ask HUMAnN 4 for counts, neither gene-length nor sample-depth normalization are performed. HOWEVER, because HUMAnN will still allow a read’s weight to be divided over multiple target sequences (i.e. in cases of ambiguous mapping), the “counts” are not always integers, especially in the context of translated search / unclassified abundances (where multiple-mapping is more common).

mguaita · June 22, 2026, 5:48pm

Hello, I have a question regarding this same topic. Humann v4 (alpha) returns decimals when using the parameter --count-normalization Counts. Is there a recommended way to transform this pseudo-counts to integer data to be used in count-based statistical models like ALDEx2 (edgeR seems not appropriate)? Or is it better to stick to adjustedCPMs for further downstream analysis?

Turns out MaAsLin is not an option for my data design.

Thank you,

Topic		Replies	Views
How to get count data (not RPK)? HUMAnN	2	129	June 22, 2026
Deseq2 analysis of Humann3 outputs - clarification HUMAnN	3	158	October 30, 2024
Deseq2 analysis of Humann3 outputs HUMAnN	3	973	January 3, 2024
How should I compare HUMAnN 4 “Counts” from concatenated paired-end reads to known species read abundances? HUMAnN	3	134	November 6, 2025
Gene length normalization HUMAnN	2	1298	November 7, 2020

HUMAnN v4.0.0.alpha.1 Raw Counts

Related topics