I’ve been using SparseDossa2 to obtain simulation data from real datasets. My OTU (Operational Taxonomic Unit) table corresponds to different responses.
I want to know if I use Sparsedossa2 model to fit OTU table, will the simulated sample order change during the simulation? In other words, whether the simulated data can still correspond to the original response.
The sample order will not be preserved. If you use the same metadata to generate simulated data, the case/control group can at least be preserved. But even in that case the residuals are not preserved, so I wouldn’t consider the order remains the same.
I actually ran into a similar problem. Just to clarify on your response, for example, if I would like to simulate 50 samples, and for my metadata matrix, I specify the first 30 to be control, and rest 20 to be case, then I spike-in an association between a feature and this metadata matrix, for the generated 50 sample, can I understand that the first 30 generated are the case, and the rest 20 are the control? Also, can you explain what you mean by the residuals are not preserved? Thanks!
Hi - yes, that is the correct way to understand this.
By residual, I mean the remaining biological variation in the real-world samples that do not correspond to case/control difference. For example, some samples within the control group can be more Firmicutes-enriched, while others are Bacteroidetes-enriched. Such variations are perturbed during SparseDOSSA simulation. So the simulated 30 “control” samples, while indeed correspond to the control group, will not match the exact 30 control samples in the real-world data.