The bioBakery help forum

Understanding output

Hi,

I really like what your package has to offer but I’m having a difficult time understanding my output. I expect to get one dataset but it seems like I’m getting three distinct count datasets and metadata within SyntheticMicrobiome-Count.pcl.

Here are my questions:

  1. When I specify number_metadata = 2, I get 8 rows of metadata. Why is there double the number of continuous metadata? I see that it says that there will be in the function but I don’t know why.

  2. The second chunk is the ‘null community.’ What do you mean by this? If I don’t want any outliers or correlations, do I just grab the ‘null community’ and ignore everything else?

  3. What is the outlier chunk? What does this mean? Outlier Swap: Feature_Outlier_137 Sample: 45

  4. If I’m looking for correlations with my metadata, do I just grab the “feature spiked” chunk? Or do I need the null community AND the feature spiked chunk?

Also, is there a minumum number of samples? Because if I try to run 10 samples with 50 microbes, I get an error.

Thanks,
M

Also, is there a paper on sparseDOSSA that’s I’ve missed? I can’t find it.

I’ve been trying to figure some of this out. I want correlated microbes. So I run,

sparseDOSSA::sparseDOSSA(seed = 3,
runBugBug = TRUE,
bugBugCorr = “0.9”,
bugs_to_spike = 10,
association_type = ‘linear’,
minOccurence = 30,
number_metadata = 1
)

The parameter file says this:

Indices of bugs correlated with others: 263; 41; 165; 215; 42; 248; 47; 54; 69; 22
Indices of the bugs each correlated bug is correlated with: 16; 23; 192; 32; 169; 49; 179; 207; 239; 268

I would assume that means microbes 263 & 16 are correlated. I’m having trouble seeing how they’re correlated and it’s likely related to my confusion about the output. I’ve tried running cor() on microbes 263 & 16 within the null community and the bugToBug community for both counts and normalized counts. I also tried cbind(null, bugToBug) and then cor(). Which doesn’t really make sense to me since I’m supposed to have 50 samples and that would be 100. I also tried sparcc with all these variations.

This output is supposed to be intuitive so I’m sure I’m just making a really obvious mistake. Any advice would be greatly appreciated.