Maaslin2 vs MMUPHin

Hi, I am a new user of Maaslin2 and MMUPHin packages, and I’m not sure which one to use - as well I’m confused why the results are different.
I have 16S amplicon sequencing data, from humans. I have more than 300 samples, sequenced in 4 different runs (a few months appart and had varying quality). I would like to as well diminish the confounding effect of age of the person.
I am interested in a variable (“var”) that has 3 levels, but as it seems that MMUPHin cannot deal with more than two, then I have started with 2 levels for now.
I have count data that has been filtered to have only taxa that are present in at least 5% of the samples.
For Maaslin2 I run:

fit_data_TSS_AST = Maaslin2(
input_data = df_input_data,
input_metadata = df_input_metadata,
min_prevalence = 0.05,
normalization = “TSS”,
transform = ‘AST’,
output = “Maaslin2_TSS_AST_var_Age”,
random_effects = “Plate”,
fixed_effects = c(“var”,“Age”),
reference = “var,e3”)

For MMUPHin

fit_lm_meta ← lm_meta(feature_abd = otu_table,
batch = “Plate”,
exposure = “var”,
covariates = “Age”,
data = metadata,
control = list(verbose = FALSE))

By MMUPHin Variance explained by run (Plate) before:

       Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)    

Plate 3 4.347 1.44898 4.3819 0.05193 0.001 ***
Residuals 240 79.361 0.33067 0.94807
Total 243 83.708 1.00000

after:

       Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)    

Plate 3 3.338 1.11259 3.3757 0.04049 0.001 ***
Residuals 240 79.102 0.32959 0.95951
Total 243 82.439 1.00000

The results from Maaslin2 (significant results , under 0.05 for q-value):

feature metadata value coef stderr N N.not.0 pval qval
X64f3a773c250e16004e1546f7a504082 Age Age -0.0021747113575296 0.000428073175218923 244 38 7.57669310129663E-07 0.00140168822373988
X943b6182fabfe1d061b8b5a78dd03ba2 var e2 0.0032969691011505 0.000719395197448784 244 14 7.41129764561043E-06 0.00685545032218964
X94b51e4d3cd9c11f4295019c56c54fff Age Age -0.00441241349809916 0.00102088071466314 244 36 2.26589633545076E-05 0.0137368573040787
d485d060694e34bea416f2718b78916d var e2 0.00251710609189474 0.000591242182044747 244 16 0.0000297013130899 0.0137368573040787
X414ea75fb1462d10153c7bed625b4b99 Age Age -0.00549980227032218 0.00135891392207732 244 85 6.99837060011561E-05 0.0161837320127673
X4d6fe682a4dd9aad8decfab830a193e0 var e2 0.0305958073812541 0.0074457626693776 244 172 5.4686780910475E-05 0.0161837320127673
dfa833b266bd2993b86feab3617b34c3 var e2 0.0856153563649985 0.020827311368041 244 113 5.43566241897518E-05 0.0161837320127673
X85e5f4133e5ea47da3e24828bdf463f5 Age Age -0.0202145237636325 0.00495487202855318 244 145 6.14798474737681E-05 0.0161837320127673
X6630583dd78eb092ece5e17515eb301d Age Age -0.00287220450181886 0.000738457915721947 244 88 0.00013031433581666 0.0267868356956468
bf37c87f8841a9f1a2cd20db4fa18b74 Age Age -0.0123296167389853 0.00320981656251117 244 165 0.000157323495297528 0.0271737761445602
d6b10cc94394ef8ebb3cde7e168bad0b var e2 0.00517067322095915 0.00134861733482618 244 13 0.00016157380410279 0.0271737761445602
X6251bd9ebf43fae466939ab366f6e547 Age Age -0.0134484525431645 0.00356032198500978 244 103 0.000199970386710597 0.0308287679512171
X5e044f34a0a5ffd168ee1f5855fb99ad Age Age -0.00358554288115779 0.000956653862164803 244 13 0.000223645512868537 0.0318264768312918
X6343c15ef6a0b28bb8d019ebbcd0a55a var e2 0.014802943061795 0.0039906998959427 244 24 0.00025819175795474 0.0341181965868764
e15b6ef1cd643dff3f0649b7baba06e8 Age Age -0.00767302431211445 0.00209517285329667 244 130 0.000307588745151869 0.0379359452353972
X31168bbdcf24a70a8e892927acedee65 var e2 0.00503120048542711 0.00139140009405133 244 13 0.000365582859122471 0.0422705180860357

Results from MMUPHin (meta_fits table, first 3 and then 4d6fe682a4dd9aad8decfab830a193e0 on position 115 if sorted by q-value):

feature exposure coef stderr pval k tau2 stderr.tau2 pval.tau2 I2 H2 weight_PL1 weight_PL2 weight_PL3 weight_PL4 pval.bonf qval.fdr
dfa833b266bd2993b86feab3617b34c3 e2 0.0854174143263161 0.0189210818739646 6.34949707183327E-06 4 0.000194254244816285 0.00115934198844055 0.378899260384403 13.3664774631044 1.15428759066575 28.4064769389096 29.4447803088867 13.243601959127 28.9051407930767 0.0441671016316722 0.0441671016316722
4e8b08e013947a5b90af66139033012c e2 0.00479073978786018 0.00117065042118702 4.2697856742547E-05 2 0 4.41470455066627E-06 0.485339051662415 0 1 NA 67.4637738719965 32.5362261280035 NA 0.297006291501157 0.0644543394153818
da96578ff1ca89aed029675b4c825780 e2 0.00897873545336833 0.00211717311298607 2.22617816563329E-05 2 0 1.2990452355072E-05 0.840594251974998 0 1 NA 57.7518968829216 NA 42.2481031170784 0.154852953201452 0.0644543394153818
4d6fe682a4dd9aad8decfab830a193e0 e2 0.0313240749329559 0.0115236982582148 0.0065631980311945 4 0.00036263338927189 0.000433668319768228 0.026427971376799 68.6693789864841 3.19176565178393 26.9692047069443 24.998413065039 22.0729197113666 25.9594625166501 1 0.399936086505716

Each pipeline gives very few significant results, out of which 1 is the same (dfa833b266bd2993b86feab3617b34c3).

Questions:

  1. Are the commands correct - removing the confounding effects of the Age and Plate?
  2. Why is 4d6fe682a4dd9aad8decfab830a193e0 significant in Maaslin2 and not in MMUPHin - is it somehow related to the weight of each run? And the MMUPHin better deals with the differences between runs?
  3. The MMUPHin default normalization is “TSS” and transformation “AST” - I used the same for Maaslin2, so that the results could be comparable. However, Maaslin2 rather advises to use TMM or CSS for count data. MMUPHin should be ok to use with count data - so are these normalization and transformation methods OK to use here?

Thank you very much in advance!!

MMUPHin is intended for meta-analyses where the batch variable is usually indicating something like the identifier for different studies conducted by different research groups. Unless your experimental conditions changed dramatically across the 4 runs/plates, you probably want to use MaAsLin2 here.

From a modeling perspective, when using a variable as a random effect in MaAsLin2 vs as a batch variable in MMUPHin the data aren’t handled in exactly the same way, so it makes sense to me that you’re getting consistent but not exactly equal results.

MaAsLin2’s default TSS + LOG normalization and transform is probably the best starting point for 16S data.

Thank you Andrew for your clear reply.
One more question -
Under this thread Choosing analysis method for maaslin2 it’s mentioned :

  • Among the normalization approaches implemented in MaAsLin 2, TMM and CSS only work on counts and they also return normalized counts unlike TSS and CLR. Therefore, if your input is count, you can use the above two normalizations (i.e., TMM, CSS, or NONE (in case the data is already normalized)) without a further transformation (i.e. transform = 'NONE').

I understand from here that with counts it would be better to use TMM or CSS with no normalization? However, you suggest that TSS with LOG should be ok with 16S count data as well ?
Thank you for helping out!

It’s not that you “should” use TMM or CSS, but more that you “can”. The effect of the different choices with alternative methods is overviewed in the results section of the paper (ctrl-f “TMM”, figs S2-S5) as well as the tutorial. But generally we recommend converting counts to relative abundance without rarefaction before running MaAsLin2. That’s generally the simplest and most effective approach.

2 Likes

Thanks for the explication!
Have a nice day

1 Like