Filtering of features by abundance

Hi,

I’m having issues with the filtering by abundance when using different normalization methods. It seems Maaslin2 first runs the normalization of the data and then performs the filtering, however, it is hard to determine a value to set an abundance cut-off with normalized data. It would make more sense determine which features need to be filtered, normalize and then filter.

An example with the test data provided by Maaslin2. Note that I’m only changing the normalization method, all other parameters are set as default.

library(Maaslin2)

input_data <- system.file('extdata','HMP2_taxonomy.tsv', package="Maaslin2")

input_metadata <-system.file('extdata','HMP2_metadata.tsv', package="Maaslin2")

Model_1 <- Maaslin2(input_data, 
                    input_metadata, 
                    "/ebio/abt3_projects/small_projects/jdelacuesta/scratchpad", 
                    normalization = "CLR",
                    fixed_effects = c('diagnosis', 'dysbiosisnonIBD','dysbiosisUC','dysbiosisCD', 'antibiotics', 'age'),
                    random_effects = c('site', 'subject'),
                    standardize = FALSE)

Model_2 <- Maaslin2(input_data, 
                    input_metadata, 
                    "/ebio/abt3_projects/small_projects/jdelacuesta/scratchpad", 
                    normalization = "NONE",
                    fixed_effects = c('diagnosis', 'dysbiosisnonIBD','dysbiosisUC','dysbiosisCD', 'antibiotics', 'age'),
                    random_effects = c('site', 'subject'),
                    standardize = FALSE)

Using CLR transformation results in different number of filtered features:

From Model_1

2020-03-12 15:58:04 INFO::Writing function arguments to log file
2020-03-12 15:58:04 INFO::Verifying options selected are valid
2020-03-12 15:58:04 INFO::Determining format of input files
2020-03-12 15:58:04 INFO::Input format is data samples as rows and metadata samples as rows
2020-03-12 15:58:04 INFO::Formula for random effects: expr ~ (1 | site) + (1 | subject)
2020-03-12 15:58:04 INFO::Formula for fixed effects: expr ~  diagnosis + dysbiosisnonIBD + dysbiosisUC + dysbiosisCD + antibiotics + age
2020-03-12 15:58:04 INFO::Running selected normalization method: CLR
2020-03-12 15:58:04 INFO::Filter data based on min abundance and min prevalence
2020-03-12 15:58:04 INFO::Total samples in data: 1595
2020-03-12 15:58:04 INFO::Min samples required with min abundance for a feature not to be filtered: 159.500000
2020-03-12 15:58:04 INFO::Total filtered features: 51

From Model_2

2020-03-12 15:58:30 INFO::Writing function arguments to log file
2020-03-12 15:58:30 INFO::Verifying options selected are valid
2020-03-12 15:58:30 INFO::Determining format of input files
2020-03-12 15:58:30 INFO::Input format is data samples as rows and metadata samples as rows
2020-03-12 15:58:30 INFO::Formula for random effects: expr ~ (1 | site) + (1 | subject)
2020-03-12 15:58:30 INFO::Formula for fixed effects: expr ~  diagnosis + dysbiosisnonIBD + dysbiosisUC + dysbiosisCD + antibiotics + age
2020-03-12 15:58:30 INFO::Running selected normalization method: NONE
2020-03-12 15:58:30 INFO::Filter data based on min abundance and min prevalence
2020-03-12 15:58:30 INFO::Total samples in data: 1595
2020-03-12 15:58:30 INFO::Min samples required with min abundance for a feature not to be filtered: 159.500000
2020-03-12 15:58:30 INFO::Total filtered features: 0

Hi,
Your suggestion makes perfect sense. Given that Maaslin2 currently does not implement this, might I suggest the following work-around?

  1. The user would determine the feature that needs to be filtered. This can be realized by reading the abundance table into R as a data.frame, and then selecting features according to some threshold (minimal abundance, for example).
  2. Normalize feature abundance table (by TSS, for example). Then subset the normalized table to features selected in 1.
  3. Provide the normalized-filtered table as input to Maaslin2. Then set normalization to “NONE” and min_abundance to -Inf, essentially inducing no normalization and filtering within Maaslin2.

Apologies that we don’t have more convenient solutions! Let me know if this doesn’t make sense, or if you’d need help implementing any of the steps in R.

Siyuan

Hi Siyuan,

Thanks for your reply. Indeed that was exactly what I did, I transformed my raw data and filtered before running Maaslin2. This is not complicated at all, but it took me a while to realize what was going on and what the solution was.

Cheers.

Has this been updated?

Hi @Negin,

The filtering functionality of Maaslin has not been changed. If you would like to change the ordering I would suggest following what @sma recommended!

Cheers,
Jacob Nearing