Hello,
I am using MaAsLin3 (v0.99.11) on R and the microbiome data I use is already transformed with TSS (Metaphlan 4.1) and what I have is relative abundance (%s and not proportional).
I was testing if TSS and/or input as proportion will result in same output as I am assuming and expecting. So, I tested the tool with 4 different scenarios:
1- the input as %s:
Normalization “NONE”, Transformation “LOG”, minimum Prevalence “0.1”, minimum abundance “0.1” and zero threshold “0.01”
2- the input as proportions:
Normalization “NONE”, Transformation “LOG”, minimum Prevalence “0.1”, minimum abundance “0.001” and zero threshold “0.0001”
3- the input as %s:
Normalization “TSS”, Transformation “LOG”, minimum Prevalence “0.1”, minimum abundance “0.001” and zero threshold “0.0001”
4- the input as proportions :
Normalization “TSS”, Transformation “LOG”, minimum Prevalence “0.1”, minimum abundance “0.001” and zero threshold “0.0001”
If my understanding is correct, all those scenarios should give the same results however, when using TSS with input as %, there are inconsistency with output feature data files and the results.
I traced this issue and found that in scenario 3 (in the filtered_data output) when using TSS with % relative abundance input, the zero_threshould “0.0001” was not applied and some features have very low abundance below the threshold selected unlike when TSS was not used or used when input is proportions.
Here is an example for this discrepancy (data represent 1 subject):
Phocaeicola_vulgatus:
Original % relative abundance = 0.00172%
relative abundance from Filtered_data output file:
Scenario 1 (no TSS used, % input) = NA
Scenario 2 (no TSS used, proportion input) = NA
Scenario 3 (TSS used, % input) = 1.72e^-5
Scenario 4 (TSS used, proportion input) = NA
Shouldn’t this low abundant feature get recognized as zero when TSS is used with % RA input?
It’s worth noting that all other values of the relative abundances matched between 4 scenarios except when the values are below zero threshold, they low values are kept in the TSS with % RA and not flagged as NA.
Because of this discrepancies (TSS with % input compared to other scenarios):
the number of N_not_zero differs in the output results files
the null hypothesis values differs
(for instance, null value in no TSS /TSS with proportion input = -0.102230701455317, while in TSS with % input = -0.195164587365035)
This resulted in inconsistent output and some features are shown to be significant when no TSS is used or used with proportion input compared to when TSS is used with % input and vice versa.
Here is my R code:
#Scenario 1:
set.seed(205)
out_T1 ← maaslin3(input_data = RA_input_perc,
input_metadata = meta_data,
output = ‘output/1. NONE LOG 0.1 0.1 0.01’,
formula = ‘~ predictor_var’,
normalization = ‘NONE’,
transform = “LOG”,
correction = ‘BH’,
min_prevalence = 0.1,
min_abundance = 0.1,
zero_threshold = 0.01,
augment = TRUE,
standardize = TRUE,
max_significance = 0.1,
warn_prevalence = FALSE,
max_pngs = 250,
cores = 4)
#Scenario 2:
set.seed(205)
out_T2 ← maaslin3(input_data = RA_input_prop,
input_metadata = meta_data,
output = ‘output/2. PROP NONE LOG 0.1 0.001 0.0001’,
formula = ‘~ predictor_var’,
normalization = ‘NONE’,
transform = “LOG”,
correction = ‘BH’,
min_prevalence = 0.1,
min_abundance = 0.1,
zero_threshold = 0.01,
augment = TRUE,
standardize = TRUE,
max_significance = 0.1,
warn_prevalence = FALSE,
max_pngs = 250,
cores = 4)
#Scenario 3:
set.seed(205)
out_T3 ← maaslin3(input_data = RA_input_perc,
input_metadata = meta_data,
output = ‘output/3. TSS LOG 0.1 0.001 0.0001’,
formula = ‘~ predictor_var’,
normalization = ‘TSS’,
transform = “LOG”,
correction = ‘BH’,
min_prevalence = 0.1,
min_abundance = 0.001,
zero_threshold = 0.0001,
augment = TRUE,
standardize = TRUE,
max_significance = 0.1,
warn_prevalence = FALSE,
max_pngs = 250,
cores = 4)
set.seed(205)
out_T4 ← maaslin3(input_data = RA_input_prop,
input_metadata = meta_data,
output = ‘output/4. PROP TSS LOG 0.1 0.001 0.0001’,
formula = ‘~ predictor_var’,
normalization = ‘TSS’,
transform = “LOG”,
correction = ‘BH’,
min_prevalence = 0.1,
min_abundance = 0.001,
zero_threshold = 0.0001,
augment = TRUE,
standardize = TRUE,
max_significance = 0.1,
warn_prevalence = FALSE,
max_pngs = 250,
cores = 4)
Do I have anything wrong with my code or am I missing anything that resulted in this issue?
Thanks
UPDATE:
I think I understand now where this is coming from.
It could be the zero_threshold value I specified when using TSS with % input. I considered this operation is done after normalization (originally I was considering 0.01% as zero threshold) but as it’s renormalized with TSS (divided by 100) I though I should change 0.01 to 0.0001.
I was confused because min_abundance is done after normalization so I assumed that zero_threshold is also done after normalization. (I could not find anything on the order of zero_threshold in the tutorial page)
If the zero_threshold operation is done before normalization, then I should keep it as 0.01.
Please correct me if I am wrong with this.
Thanks again,