Filtering of features by abundance

jacodela · March 24, 2020, 1:43pm

Hi,

I’m having issues with the filtering by abundance when using different normalization methods. It seems Maaslin2 first runs the normalization of the data and then performs the filtering, however, it is hard to determine a value to set an abundance cut-off with normalized data. It would make more sense determine which features need to be filtered, normalize and then filter.

An example with the test data provided by Maaslin2. Note that I’m only changing the normalization method, all other parameters are set as default.

library(Maaslin2)

input_data <- system.file('extdata','HMP2_taxonomy.tsv', package="Maaslin2")

input_metadata <-system.file('extdata','HMP2_metadata.tsv', package="Maaslin2")

Model_1 <- Maaslin2(input_data, 
                    input_metadata, 
                    "/ebio/abt3_projects/small_projects/jdelacuesta/scratchpad", 
                    normalization = "CLR",
                    fixed_effects = c('diagnosis', 'dysbiosisnonIBD','dysbiosisUC','dysbiosisCD', 'antibiotics', 'age'),
                    random_effects = c('site', 'subject'),
                    standardize = FALSE)

Model_2 <- Maaslin2(input_data, 
                    input_metadata, 
                    "/ebio/abt3_projects/small_projects/jdelacuesta/scratchpad", 
                    normalization = "NONE",
                    fixed_effects = c('diagnosis', 'dysbiosisnonIBD','dysbiosisUC','dysbiosisCD', 'antibiotics', 'age'),
                    random_effects = c('site', 'subject'),
                    standardize = FALSE)

Using CLR transformation results in different number of filtered features:

From Model_1

2020-03-12 15:58:04 INFO::Writing function arguments to log file
2020-03-12 15:58:04 INFO::Verifying options selected are valid
2020-03-12 15:58:04 INFO::Determining format of input files
2020-03-12 15:58:04 INFO::Input format is data samples as rows and metadata samples as rows
2020-03-12 15:58:04 INFO::Formula for random effects: expr ~ (1 | site) + (1 | subject)
2020-03-12 15:58:04 INFO::Formula for fixed effects: expr ~  diagnosis + dysbiosisnonIBD + dysbiosisUC + dysbiosisCD + antibiotics + age
2020-03-12 15:58:04 INFO::Running selected normalization method: CLR
2020-03-12 15:58:04 INFO::Filter data based on min abundance and min prevalence
2020-03-12 15:58:04 INFO::Total samples in data: 1595
2020-03-12 15:58:04 INFO::Min samples required with min abundance for a feature not to be filtered: 159.500000
2020-03-12 15:58:04 INFO::Total filtered features: 51

From Model_2

2020-03-12 15:58:30 INFO::Writing function arguments to log file
2020-03-12 15:58:30 INFO::Verifying options selected are valid
2020-03-12 15:58:30 INFO::Determining format of input files
2020-03-12 15:58:30 INFO::Input format is data samples as rows and metadata samples as rows
2020-03-12 15:58:30 INFO::Formula for random effects: expr ~ (1 | site) + (1 | subject)
2020-03-12 15:58:30 INFO::Formula for fixed effects: expr ~  diagnosis + dysbiosisnonIBD + dysbiosisUC + dysbiosisCD + antibiotics + age
2020-03-12 15:58:30 INFO::Running selected normalization method: NONE
2020-03-12 15:58:30 INFO::Filter data based on min abundance and min prevalence
2020-03-12 15:58:30 INFO::Total samples in data: 1595
2020-03-12 15:58:30 INFO::Min samples required with min abundance for a feature not to be filtered: 159.500000
2020-03-12 15:58:30 INFO::Total filtered features: 0

sma · April 6, 2020, 2:39am

Hi,
Your suggestion makes perfect sense. Given that Maaslin2 currently does not implement this, might I suggest the following work-around?

The user would determine the feature that needs to be filtered. This can be realized by reading the abundance table into R as a data.frame, and then selecting features according to some threshold (minimal abundance, for example).
Normalize feature abundance table (by TSS, for example). Then subset the normalized table to features selected in 1.
Provide the normalized-filtered table as input to Maaslin2. Then set normalization to “NONE” and min_abundance to -Inf, essentially inducing no normalization and filtering within Maaslin2.

Apologies that we don’t have more convenient solutions! Let me know if this doesn’t make sense, or if you’d need help implementing any of the steps in R.

Siyuan

jacodela · April 6, 2020, 7:39am

Hi Siyuan,

Thanks for your reply. Indeed that was exactly what I did, I transformed my raw data and filtered before running Maaslin2. This is not complicated at all, but it took me a while to realize what was going on and what the solution was.

Cheers.

Negin · May 25, 2023, 9:29pm

Has this been updated?

nearinj · May 26, 2023, 7:41pm

Hi @Negin,

The filtering functionality of Maaslin has not been changed. If you would like to change the ordering I would suggest following what @sma recommended!

Cheers,
Jacob Nearing

Topic		Replies	Views
Question regarding how min_prevalence and min_abundance work in Maaslin2 MaAsLin	2	140	June 24, 2024
Is normalization applied before or after filtering in maaslin3? MaAsLin	1	47	January 30, 2025
Maaslin3 stuck at filtering MaAsLin	9	95	March 24, 2025
Filter-normalize order and comments on tutorial MaAsLin	1	669	May 20, 2022
Metagenomic and min_abundance filtering MaAsLin	2	743	February 14, 2023

Filtering of features by abundance

Related topics