The bioBakery help forum

Filtering of features by abundance

Hi,

I’m having issues with the filtering by abundance when using different normalization methods. It seems Maaslin2 first runs the normalization of the data and then performs the filtering, however, it is hard to determine a value to set an abundance cut-off with normalized data. It would make more sense determine which features need to be filtered, normalize and then filter.

An example with the test data provided by Maaslin2. Note that I’m only changing the normalization method, all other parameters are set as default.

library(Maaslin2)

input_data <- system.file('extdata','HMP2_taxonomy.tsv', package="Maaslin2")

input_metadata <-system.file('extdata','HMP2_metadata.tsv', package="Maaslin2")

Model_1 <- Maaslin2(input_data, 
                    input_metadata, 
                    "/ebio/abt3_projects/small_projects/jdelacuesta/scratchpad", 
                    normalization = "CLR",
                    fixed_effects = c('diagnosis', 'dysbiosisnonIBD','dysbiosisUC','dysbiosisCD', 'antibiotics', 'age'),
                    random_effects = c('site', 'subject'),
                    standardize = FALSE)

Model_2 <- Maaslin2(input_data, 
                    input_metadata, 
                    "/ebio/abt3_projects/small_projects/jdelacuesta/scratchpad", 
                    normalization = "NONE",
                    fixed_effects = c('diagnosis', 'dysbiosisnonIBD','dysbiosisUC','dysbiosisCD', 'antibiotics', 'age'),
                    random_effects = c('site', 'subject'),
                    standardize = FALSE)

Using CLR transformation results in different number of filtered features:

From Model_1

2020-03-12 15:58:04 INFO::Writing function arguments to log file
2020-03-12 15:58:04 INFO::Verifying options selected are valid
2020-03-12 15:58:04 INFO::Determining format of input files
2020-03-12 15:58:04 INFO::Input format is data samples as rows and metadata samples as rows
2020-03-12 15:58:04 INFO::Formula for random effects: expr ~ (1 | site) + (1 | subject)
2020-03-12 15:58:04 INFO::Formula for fixed effects: expr ~  diagnosis + dysbiosisnonIBD + dysbiosisUC + dysbiosisCD + antibiotics + age
2020-03-12 15:58:04 INFO::Running selected normalization method: CLR
2020-03-12 15:58:04 INFO::Filter data based on min abundance and min prevalence
2020-03-12 15:58:04 INFO::Total samples in data: 1595
2020-03-12 15:58:04 INFO::Min samples required with min abundance for a feature not to be filtered: 159.500000
2020-03-12 15:58:04 INFO::Total filtered features: 51

From Model_2

2020-03-12 15:58:30 INFO::Writing function arguments to log file
2020-03-12 15:58:30 INFO::Verifying options selected are valid
2020-03-12 15:58:30 INFO::Determining format of input files
2020-03-12 15:58:30 INFO::Input format is data samples as rows and metadata samples as rows
2020-03-12 15:58:30 INFO::Formula for random effects: expr ~ (1 | site) + (1 | subject)
2020-03-12 15:58:30 INFO::Formula for fixed effects: expr ~  diagnosis + dysbiosisnonIBD + dysbiosisUC + dysbiosisCD + antibiotics + age
2020-03-12 15:58:30 INFO::Running selected normalization method: NONE
2020-03-12 15:58:30 INFO::Filter data based on min abundance and min prevalence
2020-03-12 15:58:30 INFO::Total samples in data: 1595
2020-03-12 15:58:30 INFO::Min samples required with min abundance for a feature not to be filtered: 159.500000
2020-03-12 15:58:30 INFO::Total filtered features: 0