The bioBakery help forum

Smoothing DNA-level features to avoid divide-by-zero errors


I’m looking to normalize paired RNA and DNA abundances from meta-samples analyzed with Humann3.

The Humann3 wiki suggests:

For low-abundance species, random sampling may lead to detection of transcripts for undetected genes. In these cases, we recommend smoothing DNA-level features to avoid divide-by-zero errors during normalization.

Are there commands/options to do this smoothing in Humann3, or do I perform this on my own? Perhaps it’s automatically done in the CPM normalization step? When I searched for this I couldn’t find it in the Humann2/3 wikis or tutorials, but I did see that Humann (legacy) included a Witten–Bell smoothed output file.


Sorry for the delayed reply here. This is actually an area where we’ve been doing some research recently to determine the best practices for combined RNA + DNA analysis (which haven’t made their way into HUMAnN yet).

If you’re looking to smooth zero values, my current suggestion is to do it per-feature, replacing zeroes with half the smallest non-zero feature for that measurement (smoothing features per-sample turns out to bake some unwanted signal of sequencing depth into the data). This procedure is also a lot easier to implement than the Witten-Bell method.