Inconsistency in results in MaAsLin3 when using TSS and input as % relative abundance

Hello,
I am using MaAsLin3 (v0.99.11) on R and the microbiome data I use is already transformed with TSS (Metaphlan 4.1) and what I have is relative abundance (%s and not proportional).

I was testing if TSS and/or input as proportion will result in same output as I am assuming and expecting. So, I tested the tool with 4 different scenarios:

1- the input as %s:
Normalization “NONE”, Transformation “LOG”, minimum Prevalence “0.1”, minimum abundance “0.1” and zero threshold “0.01”

2- the input as proportions:
Normalization “NONE”, Transformation “LOG”, minimum Prevalence “0.1”, minimum abundance “0.001” and zero threshold “0.0001”

3- the input as %s:
Normalization “TSS”, Transformation “LOG”, minimum Prevalence “0.1”, minimum abundance “0.001” and zero threshold “0.0001”

4- the input as proportions :
Normalization “TSS”, Transformation “LOG”, minimum Prevalence “0.1”, minimum abundance “0.001” and zero threshold “0.0001”

If my understanding is correct, all those scenarios should give the same results however, when using TSS with input as %, there are inconsistency with output feature data files and the results.

I traced this issue and found that in scenario 3 (in the filtered_data output) when using TSS with % relative abundance input, the zero_threshould “0.0001” was not applied and some features have very low abundance below the threshold selected unlike when TSS was not used or used when input is proportions.

Here is an example for this discrepancy (data represent 1 subject):
Phocaeicola_vulgatus:
Original % relative abundance = 0.00172%
relative abundance from Filtered_data output file:
Scenario 1 (no TSS used, % input) = NA
Scenario 2 (no TSS used, proportion input) = NA
Scenario 3 (TSS used, % input) = 1.72e^-5
Scenario 4 (TSS used, proportion input) = NA

Shouldn’t this low abundant feature get recognized as zero when TSS is used with % RA input?

It’s worth noting that all other values of the relative abundances matched between 4 scenarios except when the values are below zero threshold, they low values are kept in the TSS with % RA and not flagged as NA.

Because of this discrepancies (TSS with % input compared to other scenarios):
the number of N_not_zero differs in the output results files
the null hypothesis values differs
(for instance, null value in no TSS /TSS with proportion input = -0.102230701455317, while in TSS with % input = -0.195164587365035)

This resulted in inconsistent output and some features are shown to be significant when no TSS is used or used with proportion input compared to when TSS is used with % input and vice versa.

Here is my R code:

#Scenario 1:
set.seed(205)
out_T1 ← maaslin3(input_data = RA_input_perc,
input_metadata = meta_data,
output = ‘output/1. NONE LOG 0.1 0.1 0.01’,
formula = ‘~ predictor_var’,
normalization = ‘NONE’,
transform = “LOG”,
correction = ‘BH’,
min_prevalence = 0.1,
min_abundance = 0.1,
zero_threshold = 0.01,
augment = TRUE,
standardize = TRUE,
max_significance = 0.1,
warn_prevalence = FALSE,
max_pngs = 250,
cores = 4)

#Scenario 2:
set.seed(205)
out_T2 ← maaslin3(input_data = RA_input_prop,
input_metadata = meta_data,
output = ‘output/2. PROP NONE LOG 0.1 0.001 0.0001’,
formula = ‘~ predictor_var’,
normalization = ‘NONE’,
transform = “LOG”,
correction = ‘BH’,
min_prevalence = 0.1,
min_abundance = 0.1,
zero_threshold = 0.01,
augment = TRUE,
standardize = TRUE,
max_significance = 0.1,
warn_prevalence = FALSE,
max_pngs = 250,
cores = 4)

#Scenario 3:
set.seed(205)
out_T3 ← maaslin3(input_data = RA_input_perc,
input_metadata = meta_data,
output = ‘output/3. TSS LOG 0.1 0.001 0.0001’,
formula = ‘~ predictor_var’,
normalization = ‘TSS’,
transform = “LOG”,
correction = ‘BH’,
min_prevalence = 0.1,
min_abundance = 0.001,
zero_threshold = 0.0001,
augment = TRUE,
standardize = TRUE,
max_significance = 0.1,
warn_prevalence = FALSE,
max_pngs = 250,
cores = 4)

set.seed(205)
out_T4 ← maaslin3(input_data = RA_input_prop,
input_metadata = meta_data,
output = ‘output/4. PROP TSS LOG 0.1 0.001 0.0001’,
formula = ‘~ predictor_var’,
normalization = ‘TSS’,
transform = “LOG”,
correction = ‘BH’,
min_prevalence = 0.1,
min_abundance = 0.001,
zero_threshold = 0.0001,
augment = TRUE,
standardize = TRUE,
max_significance = 0.1,
warn_prevalence = FALSE,
max_pngs = 250,
cores = 4)

Do I have anything wrong with my code or am I missing anything that resulted in this issue?

Thanks

UPDATE:
I think I understand now where this is coming from.

It could be the zero_threshold value I specified when using TSS with % input. I considered this operation is done after normalization (originally I was considering 0.01% as zero threshold) but as it’s renormalized with TSS (divided by 100) I though I should change 0.01 to 0.0001.

I was confused because min_abundance is done after normalization so I assumed that zero_threshold is also done after normalization. (I could not find anything on the order of zero_threshold in the tutorial page)

If the zero_threshold operation is done before normalization, then I should keep it as 0.01.

Please correct me if I am wrong with this.

Thanks again,

Hi,

The order of pre-processing steps is maaslin_normalize (when the zero_threshold is applied if you look at ?maaslin_normalize) and then maaslin_filter (when the min_abundance threshold is applied if you look at ?maaslin_filter). This was chosen because we expect the most frequent use case to be people filtering out features with low relative abundances, but those relative abundances must be computed by maaslin_normalize first. By contrast, zero_threshold needs to be applied in the normalization step because if we’re going to declare some features to have zero abundance, we need to do that before normalizing. This is so that, for example, the relative abundances sum to 1 after normalization, rather than something less than 1 if we declared some of them “actually zero” after they had already been converted to relative abundances.

This should then explain why your case 3 is difference since the zero_threshold is being applied on the scale of the inputs, rather than after normalization (all others effectively apply a 0.01% threshold whereas case 3 applies a 0.0001% threshold).

Will