Hi! I’m trying to simulate microbial data with my own specified correlation matrix. I was thinking that I could simulate data using the “Stool” template, fit the data, overwrite the appropriate matrix, and then use the fitted data to re-simulate some data, now with my specified correlation matrix. My first question is, will this work or is there an easier way to do what I’m hoping to accomplish? If so, what’s the appropriate matrix I need to overwrite? From what I understand, the correlation between microbes is controlled by Omega. However, Omega isn’t bound between -1 and 1 so I’m guessing it’s a covariance matrix. My guess would be that Corr_star is the one I’m looking for. But what’s the difference between sigma, Sigma, Omega, and Corr_star?
Hi - thanks for the interests!
You shouldn’t need to fit the data again. Just access the
Stoolfit object, which is stored internally in SparseDOSSA. You can do this with e.g.
Stool_mod <- SparseDOSSA2:::Stool.
You can then simply modify with your own correlation matrix. The one you’re looking for is
Sigmais the correlation matrix and
Omegais its inverse (i.e. the precision matrix).
sigmais per-feature standard error.
Corr_staris something SparseDOSSA uses internally when estimating
Omega. So you’d do something like
Stool_mod$EM_fit$fit$Sigma <- My_own_corr
Stool_mod$EM_fit$fit$Omega <- solve(My_own_corr)
You can then simulate data with
SparseDOSSA2(template = Stool_mod, new_features = FALSE, ...). The
new_features = FALSEsetting is important to ensure the new correlation matrix is propagated.
Thank you so much for the quick and clear response! That makes a lot of sense now.
As for #3, is there a way to specify the number of features without specifying new_features = TRUE? It seems the default is 332 if new_features = FALSE but I’d like 1,000 features instead.
This can be done but needs to call SparseDOSSA’s internal functions. To generate “new” features that “look like” the true 332 stool features, SparseDOSSA fits a three-dimensional density estimator on the pi0, mu, and sigma values of each feature. This fit is stored internally in the
To use this for your case, I’d simulate 1,000 features, and use their pi0, mu, and sigma to replace those currently stored in
Stool, just as the replacement for
- First, simulate 1,000 new futures:
Stool <- Stool_mod <- SparseDOSSA2:::Stool feature_param_template <- cbind("pi0" = Stool$EM_fit$fit$pi0, "mu" = Stool$EM_fit$fit$mu, "sigma" = Stool$EM_fit$fit$sigma) feature_param <- SparseDOSSA2:::generate_featureParam( F_fit = Stool$F_fit, param_original = feature_param_template, n_feature = 1000)
- Use the simulated
sigmato replace those stored in
Stool_mod$EM_fit$fit$pi0 <- feature_param[, "pi0"] Stool_mod$EM_fit$fit$mu <- feature_param[, "mu"] Stool_mod$EM_fit$fit$sigma <- feature_param[, "sigma"] Stool_mod$EM_fit$fit$Sigma <- My_own_corr # note the dimensionality must agree here Stool_mod$EM_fit$fit$Omega <- solve(My_own_corr)
Stool_mod can then be used for simulation just as above. I’ll consider adding an exported function for this in the next update.
Thank you! I was able to get it running very easily. Much appreciated!