Maaslin3 transformation and normaliazion fix

Dear Will and Jacob,

I need to run a maaslin3 using robust CLR instead of CLR only. I did my transformations beforehand and wanted to input my matrix using transform = “NONE” and normalization = “NONE”. But it results in an error saying the checks for these parameters are not yet supported. This can be reproduced using maaslin3 example and adding these two parameters to its call:

# Read features table
taxa_table_name <- system.file("extdata", "HMP2_taxonomy.tsv", package =
                                 "maaslin3")
taxa_table <- read.csv(taxa_table_name, sep = '\t', row.names = 1)

# Read metadata table
metadata_name <- system.file("extdata", "HMP2_metadata.tsv", package =
                               "maaslin3")
metadata <- read.csv(metadata_name, sep = '\t', row.names = 1)

metadata$diagnosis <-
  factor(metadata$diagnosis, levels = c('nonIBD', 'UC', 'CD'))
metadata$dysbiosis_state <-
  factor(metadata$dysbiosis_state, levels = c('none', 'dysbiosis_UC',
                                              'dysbiosis_CD'))
metadata$antibiotics <-
  factor(metadata$antibiotics, levels = c('No', 'Yes'))

#Run MaAsLin3
fit_out <- maaslin3::maaslin3(input_data = taxa_table,
                              input_metadata = metadata,
                              output = 'output',
                              formula = '~ diagnosis + dysbiosis_state +
                                antibiotics + age + reads',
                              plot_summary_plot = FALSE,
                              plot_associations = FALSE,
                              transform = "NONE", 
                              normalization = "NONE")
#> 2025-11-04 16:25:23.891101 INFO::Writing function arguments to log file
#> 2025-11-04 16:25:23.900789 INFO::Verifying options selected are valid
#> Warning in maaslin_check_arguments(feature_specific_covariate,
#> feature_specific_covariate_name, : Be sure the data can be TSS normalized when
#> using warn_prevalence without normalization=TSS
#> Error in maaslin_check_arguments(feature_specific_covariate, feature_specific_covariate_name, : warn_prevalence has only been validated with transform = LOG. To bypass this check, run the maaslin steps individually, skipping `maaslin_check_arguments`.

unlink('output', recursive=TRUE)
logging::logReset()

Created on 2025-11-04 with reprex v2.1.1 

I tweaked a few functions to bypass the checks, but it feels like it would be best to fix it on your end, to ensure regressions outputs are as expected if a user wants to pre-process the data themselves. What do you think?

Best
Giacomo

Hi,

If you run it with warn_prevalence=FALSE, it should work, no?

Also, the default median comparison strategy in MaAsLin 3 with TSS normalization and LOG transformation accounts for compositionality like CLR does but in a more interpretable way. We keep the CLR option for legacy support, but our evaluations showed that TSS + LOG with the median abundance comparison is both theoretically and empirically preferable.

Will

Hi Will,

Thank you for your help.
Some people could still be interested in running a sensitivity analysis with maaslin3, and CLR is a common, divisive choice.

warn_prevalence = FALSE allow the CLR input but does not leverage on the main selling point of MaAsLin3, that is the separated modeling. With this, I only get differential abundance estimates back, just like MaAsLin2 used to model them. Is this correct?

More in general, although your benchmark showed better performance of log2(TSS) transform, I feel like CLR could/should still be supported in the future for compatibility. Or is this transformation incompatible with the MaAsLin3 methodology?

Giacomo

The separated modeling should still work, even with warn_preavlence = FALSE. That parameter just determines whether you want a warning when the prevalence effect is likely induced by an abundance effect. You can determine when that’s likely the case from the log2(TSS) results, but we never tested other transformations in this regard, so we’re not prepared to support other warnings.

Regarding the more general point, there were 2 main methodological improvements in MaAsLin 3: the prevalence modeling and the median comparison. CLR is not incompatible with prevalence modeling, and arguably prevalence modeling improves CLR because you no longer need arbitrary pseudo-counts to replace the 0s in CLR. However, CLR is incompatible with the median comparison for the abundance modeling because the median comparison uses very particular properties of the log2(TSS) coefficients that don’t hold in general, and its whole point is to account for compositionality, which is what CLR is doing.

Hope that helps,

Will