SparseDOSSA2 correlation structure

Hi! I’m trying to simulate microbial data with my own specified correlation matrix. I was thinking that I could simulate data using the “Stool” template, fit the data, overwrite the appropriate matrix, and then use the fitted data to re-simulate some data, now with my specified correlation matrix. My first question is, will this work or is there an easier way to do what I’m hoping to accomplish? If so, what’s the appropriate matrix I need to overwrite? From what I understand, the correlation between microbes is controlled by Omega. However, Omega isn’t bound between -1 and 1 so I’m guessing it’s a covariance matrix. My guess would be that Corr_star is the one I’m looking for. But what’s the difference between sigma, Sigma, Omega, and Corr_star?

Hi - thanks for the interests!

1. You shouldn’t need to fit the data again. Just access the `Stool` fit object, which is stored internally in SparseDOSSA. You can do this with e.g. `Stool_mod <- SparseDOSSA2:::Stool`.

2. You can then simply modify with your own correlation matrix. The one you’re looking for is `Sigma` (and correspondingly, `Omega`). `Sigma` is the correlation matrix and `Omega` is its inverse (i.e. the precision matrix). `sigma` is per-feature standard error. `Corr_star` is something SparseDOSSA uses internally when estimating `Sigma` and `Omega`. So you’d do something like
a. `Stool_mod\$EM_fit\$fit\$Sigma <- My_own_corr`
b. `Stool_mod\$EM_fit\$fit\$Omega <- solve(My_own_corr)`

3. You can then simulate data with `SparseDOSSA2(template = Stool_mod, new_features = FALSE, ...)`. The `new_features = FALSE` setting is important to ensure the new correlation matrix is propagated.

Thank you so much for the quick and clear response! That makes a lot of sense now.

As for #3, is there a way to specify the number of features without specifying new_features = TRUE? It seems the default is 332 if new_features = FALSE but I’d like 1,000 features instead.

This can be done but needs to call SparseDOSSA’s internal functions. To generate “new” features that “look like” the true 332 stool features, SparseDOSSA fits a three-dimensional density estimator on the pi0, mu, and sigma values of each feature. This fit is stored internally in the `F_fit` component.

To use this for your case, I’d simulate 1,000 features, and use their pi0, mu, and sigma to replace those currently stored in `Stool`, just as the replacement for `Sigma` and `Omega` above.

1. First, simulate 1,000 new futures:
``````Stool <-
Stool_mod <-
SparseDOSSA2:::Stool
feature_param_template <- cbind("pi0" = Stool\$EM_fit\$fit\$pi0,
"mu" = Stool\$EM_fit\$fit\$mu,
"sigma" = Stool\$EM_fit\$fit\$sigma)
feature_param <-
SparseDOSSA2:::generate_featureParam(
F_fit = Stool\$F_fit,
param_original = feature_param_template,
n_feature = 1000)
``````
1. Use the simulated `pi0`, `mu`, `sigma` to replace those stored in `Stool_mod`:
``````Stool_mod\$EM_fit\$fit\$pi0 <- feature_param[, "pi0"]
Stool_mod\$EM_fit\$fit\$mu <- feature_param[, "mu"]
Stool_mod\$EM_fit\$fit\$sigma <- feature_param[, "sigma"]
Stool_mod\$EM_fit\$fit\$Sigma <- My_own_corr # note the dimensionality must agree here
Stool_mod\$EM_fit\$fit\$Omega <- solve(My_own_corr)
``````

`Stool_mod` can then be used for simulation just as above. I’ll consider adding an exported function for this in the next update.

Thank you! I was able to get it running very easily. Much appreciated!