The bioBakery help forum

Smoothing DNA-level features to avoid divide-by-zero errors

Hi,

I’m looking to normalize paired RNA and DNA abundances from meta-samples analyzed with Humann3.

The Humann3 wiki suggests:

For low-abundance species, random sampling may lead to detection of transcripts for undetected genes. In these cases, we recommend smoothing DNA-level features to avoid divide-by-zero errors during normalization.

Are there commands/options to do this smoothing in Humann3, or do I perform this on my own? Perhaps it’s automatically done in the CPM normalization step? When I searched for this I couldn’t find it in the Humann2/3 wikis or tutorials, but I did see that Humann (legacy) included a Witten–Bell smoothed output file.

Thanks!

Sorry for the delayed reply here. This is actually an area where we’ve been doing some research recently to determine the best practices for combined RNA + DNA analysis (which haven’t made their way into HUMAnN yet).

If you’re looking to smooth zero values, my current suggestion is to do it per-feature, replacing zeroes with half the smallest non-zero feature for that measurement (smoothing features per-sample turns out to bake some unwanted signal of sequencing depth into the data). This procedure is also a lot easier to implement than the Witten-Bell method.