Hi -
Sorry for the much belated response! I’m not one of the original developers but help maintain the package. Will try to answer your questions to the best of my knowledge:
- When I specify number_metadata = 2, I get 8 rows of metadata. Why is there double the number of continuous metadata? I see that it says that there will be in the function but I don’t know why.
I’m not sure either. My only guess is so that there will be equal numbers continuous and categorical metadata. So For number_metadata = 2 you get two continuous metadata, one binary metadata, and one quaternary metadata.
- The second chunk is the ‘null community.’ What do you mean by this? If I don’t want any outliers or correlations, do I just grab the ‘null community’ and ignore everything else?
If you are referring to rows with “log normal” in them, these are the abundance without introducing outliers into the distribution, nor correlation between microbial features or between microbial features and metadata. The “null” refers to no association. The chunk with outliers is designed to better approximate the over-dispersed distribution of microbiome data, so I’d look into either the log normal chunk or the outliers chunk, if you are looking for null data.
- What is the outlier chunk? What does this mean? Outlier Swap: Feature_Outlier_137 Sample: 45
See above. The “Swap” refers to SparseDOSSA’s mechanism of swapping values for generating outliers. Essentially the row means sample 45’s feature 137 was changed from the log normal chunk to generate an outlier.
- If I’m looking for correlations with my metadata, do I just grab the “feature spiked” chunk? Or do I need the null community AND the feature spiked chunk?
Correct. Just the “feature spiked” chunk. SparseDOSSA internally goes log normal matrix -> generate outliers -> spike in metadata association.
On your second post, why bug bug correlation does not generate strong correlations as specified, I believe this has to do with the zero-inflatedness of microbiome data. Imagine two features with ~90% zeros. Then at least 80% of their values must be zero at the same time. In this case their correlation would have to be close to zero. One way you can bypass this is set the noZeroInflate
flag to TRUE
when running to make features not zero-inflated in SparseDOSSA. One might argue the results won’t be realistic microbiome data anymore, but I cannot think of an alternative solution.
Hope these help!