I really like what your package has to offer but I’m having a difficult time understanding my output. I expect to get one dataset but it seems like I’m getting three distinct count datasets and metadata within SyntheticMicrobiome-Count.pcl.
Here are my questions:
When I specify number_metadata = 2, I get 8 rows of metadata. Why is there double the number of continuous metadata? I see that it says that there will be in the function but I don’t know why.
The second chunk is the ‘null community.’ What do you mean by this? If I don’t want any outliers or correlations, do I just grab the ‘null community’ and ignore everything else?
What is the outlier chunk? What does this mean? Outlier Swap: Feature_Outlier_137 Sample: 45
If I’m looking for correlations with my metadata, do I just grab the “feature spiked” chunk? Or do I need the null community AND the feature spiked chunk?
Also, is there a minumum number of samples? Because if I try to run 10 samples with 50 microbes, I get an error.
Thanks,
M
Also, is there a paper on sparseDOSSA that’s I’ve missed? I can’t find it.
Indices of bugs correlated with others: 263; 41; 165; 215; 42; 248; 47; 54; 69; 22
Indices of the bugs each correlated bug is correlated with: 16; 23; 192; 32; 169; 49; 179; 207; 239; 268
I would assume that means microbes 263 & 16 are correlated. I’m having trouble seeing how they’re correlated and it’s likely related to my confusion about the output. I’ve tried running cor() on microbes 263 & 16 within the null community and the bugToBug community for both counts and normalized counts. I also tried cbind(null, bugToBug) and then cor(). Which doesn’t really make sense to me since I’m supposed to have 50 samples and that would be 100. I also tried sparcc with all these variations.
This output is supposed to be intuitive so I’m sure I’m just making a really obvious mistake. Any advice would be greatly appreciated.
Unfortunately, I have not. I was told it might be helpful to email the author directly. However, I have switched to working on another part of my research for now so I haven’t done anything further with simulations. I’ve followed your question. Hopefully we’ll find out soon!
Sorry for the much belated response! I’m not one of the original developers but help maintain the package. Will try to answer your questions to the best of my knowledge:
When I specify number_metadata = 2, I get 8 rows of metadata. Why is there double the number of continuous metadata? I see that it says that there will be in the function but I don’t know why.
I’m not sure either. My only guess is so that there will be equal numbers continuous and categorical metadata. So For number_metadata = 2 you get two continuous metadata, one binary metadata, and one quaternary metadata.
The second chunk is the ‘null community.’ What do you mean by this? If I don’t want any outliers or correlations, do I just grab the ‘null community’ and ignore everything else?
If you are referring to rows with “log normal” in them, these are the abundance without introducing outliers into the distribution, nor correlation between microbial features or between microbial features and metadata. The “null” refers to no association. The chunk with outliers is designed to better approximate the over-dispersed distribution of microbiome data, so I’d look into either the log normal chunk or the outliers chunk, if you are looking for null data.
What is the outlier chunk? What does this mean? Outlier Swap: Feature_Outlier_137 Sample: 45
See above. The “Swap” refers to SparseDOSSA’s mechanism of swapping values for generating outliers. Essentially the row means sample 45’s feature 137 was changed from the log normal chunk to generate an outlier.
If I’m looking for correlations with my metadata, do I just grab the “feature spiked” chunk? Or do I need the null community AND the feature spiked chunk?
Correct. Just the “feature spiked” chunk. SparseDOSSA internally goes log normal matrix -> generate outliers -> spike in metadata association.
On your second post, why bug bug correlation does not generate strong correlations as specified, I believe this has to do with the zero-inflatedness of microbiome data. Imagine two features with ~90% zeros. Then at least 80% of their values must be zero at the same time. In this case their correlation would have to be close to zero. One way you can bypass this is set the noZeroInflate flag to TRUE when running to make features not zero-inflated in SparseDOSSA. One might argue the results won’t be realistic microbiome data anymore, but I cannot think of an alternative solution.