Maaslin3: Low Diversity and Biomass

Hi,

I work with the respiratory microbiome, which differs from many simulated datasets. My samples often have extremely low alpha diversity, dominated by one or a few ASVs. These dominant ASVs have a very high prevalence.

A small proportion of some positive controls - and most negative controls - contain ASVs from biological samples, suggesting cross-contamination. Cross-contamination is primarily from high-abundance ASVs. It may therefore be sensible for prevalence cutoffs to reflect this dynamic rather than being uniform.

In low-diversity, dominant-taxa settings, log transformations for variance stabilization may not be optimal:

  • High relative abundance of dominant ASVs are not outliers but common and may not need to be stabilized.
  • Biological associations for high-abundance ASVs (for example, with local immune markers) appear to be on an additive rather than a multiplicative scale.
  • In my experience, log-transformation amplifies signals in low-abundance taxa while dampening those from dominant taxa.

It would be nice to have the option to avoid the log transform entirely. A quasi-Poisson approach could be an alternative - relative abundances could be scaled by a large number and rounded to integers.

I’m considering forking the package and implementing this. Perhaps some of these reflections are of interest. I would greatly appreciate any input.

Best regards,
Anton

Hi Anton,

You can already skip the log transform entirely by choosing transform=”NONE”. It’s not clear to me that the quasi-Poisson approach of scaling by a large number and rounding to nearest integer would actually be helpful. If you scale by the same large number for each sample and then fit a standard linear model, you’ll have the same model but just with a scaling factor. If you scale and then fit a quasi-Poisson, you have the usual questions of whether your model actually fits the data. Let me know if that answers some of your questions.

Will