SparseDOSSA effect size question

ChYC · September 23, 2021, 1:45pm

Hello, I’m writing to ask for advice on how to control the effect size in the “Large-scale Benchmarking of Microbial Multivariable Association Methods” which uses SparseDOSSA to simulate data.
My problem is as follows. When doing Spearman’s or Pearson’s correlation between the spiked-in metadata (continuous variable) and the spiked-in feature, I always get really low association estimate (around 0.1 ~ 0.2) no matter how high the “effectSize” parameter is specified. I have set effectSize ranging from 1~5000 (either as numeric --effectSize 50 or string --effectSize “50"), but the correlation didn’t increase at all. If possible, could you please advice me how I can get a higher effect size so that the Spearman’s rho/Pearson’s r can reach 0.5? Thank you very much in advance!

mishort · September 23, 2021, 5:45pm

Hi,
I can definitely help with that–would you mind sharing the piece of code you used to generate the simulated datasets with the spiked in data? That will help me troubleshoot.
Thanks so much,
Meg

ChYC · September 23, 2021, 6:17pm

Thank you very much! Below is my code.

Rscript Codes/Load_Simulator.R --metadataType MVB --nSubjects 300 --nPerSubject 2 --RandomEffect TRUE --nMicrobes 200 --spikeMicrobes 0.1 --nMetadata 4 --spikeMetadata 0.5 --effectSize 50 --workingDirectory "SparseDossa/" --nCores 4 --nIterations 250 --rSeed 101

Here I’ve tried many values for the “effectSize”, ranging from 1~5000, but the correlation estimate between metadata_TP (continuous variable) and feature_TP didn’t change much (only 0.1~0.2). So I’m wondering how to increase the effectSize so that I can get correlation around 0.5. Thank you!

mishort · September 27, 2021, 3:50pm

Hello,
If I’m not mistaken, the code you sent runs a script called Load_Simulator.R, which likely calls the sparseDOSSA command from within R, using arguments you’ve denoted here. It’s hard for me to troubleshoot with just this code, since I can’t see “under the hood” to how these arguments are being used by sparseDOSSA.
Another thing I’ll mention is that we recently published an updated version, “SparseDOSSA2”, which I’ll recommend if you are just starting out with these simulations. If it is too much overhead to switch to SparseDOSSA2, feel free to send me your Load_Simulator.R script and I can try to troubleshoot what’s happening here.
Best,
Meg

ChYC · September 27, 2021, 4:09pm

Hi Meg,

Thank you so much for your reply! In fact, the “Load_Simulator.R” is the identical one on GitHub provided by your lab. You can find it here: maaslin2_benchmark/Load_Simulator.R at master · biobakery/maaslin2_benchmark · GitHub. The code I pasted in my last post was in accordant with the examples here: GitHub - biobakery/maaslin2_benchmark: Large-scale Benchmarking of Microbial Multivariable Association Methods.

I’m aware that there’s SparseDOSSA2 and would like to try it. But since currently this maaslin2_benchmark simulation pipeline worked really well for me (except the effect size issue), I’d like to continue with it. If possible, could you please look into this issue again? Thank you very much for your great help!

mishort · September 27, 2021, 5:44pm

Ah, thanks for pointing me towards the maaslin2 benchmarking codes-- I’ll troubleshoot in the next few days and let you know what I find.
Best,
Meg

himel.mallick · September 30, 2021, 4:55pm

Thanks, @mishort for taking the first pass at this, and thanks, @franzosa for the additional insight.

@ChYC Thanks for your interest in MaAsLin 2 benchmarking and thanks for flagging the issue. I was able to do a quick and dirty investigation on my end (see attached).
SparseDOSSA_MVB.R.txt (4.6 KB)
SparseDOSSA_UVA.R.txt (3.4 KB)

In one investigation (univariate continuous, no repeated measures, SparseDOSSA_UVA.R), when zero-inflation is introduced, the spearman correlation is dramatically reduced from around 0.45-ish to 0.2-ish. You can reproduce this by simply running the code and by changing the noZeroInflate parameter in the SparseDOSSA call.

When I repeated the investigation for more complex scenario (multivariable continuous, repeated measures, same as your example, @ChYC, SparseDOSSA_MVB.R), I was able to get around ~0.4ish correlation without zero-inflation but not when zero-inflation is introduced (dropping to around 0.26, as you have noted, @ChYC).

In my mind, this is a combination of (i) SparseDOSSA 1 is designed to introduce weak to moderate “relative” effect sizes (which is resolved in SparseDOSSA 2, tagging you, @sma, in case you want to chime in) to ensure that the counts of the modified feature are not dominated by the values of the target metadata but rather distributed similarly to real data (in other words, the effect size is always introduced as (b/b+1), where b = user-defined effect size), and (ii) this effect size reduces dramatically in the presence of zeroes which makes sense given the reasoning that a lot of the “spike” is being “absorbed” by turning low-abundance features into zero counts and it’s expected not to see a high correlation with the spiked-in variable that has zero-inflation.

To summarize, you might be able to get around 0.5 correlation by turning the zero-inflation off but it would be difficult to arbitrarily introduce a large effect size because of the way SparseDOSSA 1 handles the spike in. Does it make sense?

sma · September 30, 2021, 5:27pm

Hi all -

Himel is right - SparseDOSSA v1 implements effectively a “b/(b+1)” association during spike-in. It mixes one part the original null microbe with b parts spiked-in artificial variable that’s associated with the metadata.

This approach keeps the marginal mean and variance of the spiked-in microbe approximately the same as the original null. The downside is the spike-in strength will be capped after a certain effect size. Intuition: to introduce a very strong association with the metadata, then naturally eg the original variance would have to change. We noticed this in our simulations too.

In v2 we wanted to change this behavior, so the model is directly “b” effect instead of “b/(b+1)”. So if you would like stronger associations than v1 allows for, I’m afraid you’ll have to adopt the v2 implementation.

ChYC · October 1, 2021, 2:33pm

Thank you @himel.mallick and @sma for your detailed answers! They are really helpful and clarified my questions. And thank you @mishort for helping me with this issue in the fist place! I’ll proceed with the suggestions (e.g. set zero-inflation to false or SparseDOSSA2).
Thank you all once again!

Topic		Replies	Views
Inquiry for the usage of sparseDOSSA package SparseDOSSA	6	1050	June 12, 2020
What does "spikein.mt" do? Hoping to be able to provide specific OTUs to correlate SparseDOSSA	3	435	August 31, 2020
SparseDOSSA2 correlation structure SparseDOSSA	7	824	November 17, 2022
Understanding output SparseDOSSA	5	486	January 11, 2021
About the SparseDOSSA category SparseDOSSA	0	479	October 31, 2019

SparseDOSSA effect size question

Related topics