The bioBakery help forum

SparseDOSSA2 correlation structure

Hi! I’m trying to simulate microbial data with my own specified correlation matrix. I was thinking that I could simulate data using the “Stool” template, fit the data, overwrite the appropriate matrix, and then use the fitted data to re-simulate some data, now with my specified correlation matrix. My first question is, will this work or is there an easier way to do what I’m hoping to accomplish? If so, what’s the appropriate matrix I need to overwrite? From what I understand, the correlation between microbes is controlled by Omega. However, Omega isn’t bound between -1 and 1 so I’m guessing it’s a covariance matrix. My guess would be that Corr_star is the one I’m looking for. But what’s the difference between sigma, Sigma, Omega, and Corr_star?

Hi - thanks for the interests!

  1. You shouldn’t need to fit the data again. Just access the Stool fit object, which is stored internally in SparseDOSSA. You can do this with e.g. Stool_mod <- SparseDOSSA2:::Stool.

  2. You can then simply modify with your own correlation matrix. The one you’re looking for is Sigma (and correspondingly, Omega). Sigma is the correlation matrix and Omega is its inverse (i.e. the precision matrix). sigma is per-feature standard error. Corr_star is something SparseDOSSA uses internally when estimating Sigma and Omega. So you’d do something like
    a. Stool_mod$EM_fit$fit$Sigma <- My_own_corr
    b. Stool_mod$EM_fit$fit$Omega <- solve(My_own_corr)

  3. You can then simulate data with SparseDOSSA2(template = Stool_mod, new_features = FALSE, ...). The new_features = FALSE setting is important to ensure the new correlation matrix is propagated.

Thank you so much for the quick and clear response! That makes a lot of sense now.

As for #3, is there a way to specify the number of features without specifying new_features = TRUE? It seems the default is 332 if new_features = FALSE but I’d like 1,000 features instead.

This can be done but needs to call SparseDOSSA’s internal functions. To generate “new” features that “look like” the true 332 stool features, SparseDOSSA fits a three-dimensional density estimator on the pi0, mu, and sigma values of each feature. This fit is stored internally in the F_fit component.

To use this for your case, I’d simulate 1,000 features, and use their pi0, mu, and sigma to replace those currently stored in Stool, just as the replacement for Sigma and Omega above.

  1. First, simulate 1,000 new futures:
Stool <- 
  Stool_mod <- 
feature_param_template <- cbind("pi0" = Stool$EM_fit$fit$pi0,
                                "mu" = Stool$EM_fit$fit$mu,
                                "sigma" = Stool$EM_fit$fit$sigma)
feature_param <- 
    F_fit = Stool$F_fit,
    param_original = feature_param_template,
    n_feature = 1000)
  1. Use the simulated pi0, mu, sigma to replace those stored in Stool_mod:
Stool_mod$EM_fit$fit$pi0 <- feature_param[, "pi0"]
Stool_mod$EM_fit$fit$mu <- feature_param[, "mu"]
Stool_mod$EM_fit$fit$sigma <- feature_param[, "sigma"]
Stool_mod$EM_fit$fit$Sigma <- My_own_corr # note the dimensionality must agree here
Stool_mod$EM_fit$fit$Omega <- solve(My_own_corr)

Stool_mod can then be used for simulation just as above. I’ll consider adding an exported function for this in the next update.

Thank you! I was able to get it running very easily. Much appreciated!