The bioBakery help forum

Simulating datasets for analyzing microbial co-occurence network Tools

Dear All @sagunmaharjann,
I am working on testing a newly developed tool to study the microbial co-occurrence and I need a simulation datasets with ground truth to test the performance of our method compared to other standardized methods like MDINE, SPIEC-EASY.
Luckily I found SparseDOSSA R package and I tried it myself.
n.microbes <- 200
n.samples <- 100
n.metadata <- 2
sparseDOSSA::sparseDOSSA( number_features = n.microbes,
number_samples = n.samples,
number_metadata = n.metadata,
runBugBug = TRUE, bugBugCorr = “0.2”,
bugs_to_spike = 5 )

And I got the output as described in the tutorial. I have attached my output. SyntheticMicrobiome-Counts.csv (415.9 KB)

I would like to know where can I get the ground truth of the co-occurence pattern? And also, in the SyntheticMicrobiome-Counts files, there are different labels.

Feature_Lognormal_1: Rows with these kind of label represents the counts
What does the following label means? (a_5, d_1_1)?

And I also would like to know which sets of rows I should use for testing the tools?



Sorry for the late response! Please see my response to the other post. I believe in your case it’s the rows that have “BugToBugAssociations” that are of interests to you.

  • Microbe pairs with true associations are specified in the “SyntheticMicrobiomeParameterFile.txt” file. Look for rows that start with “Indices of bugs correlated with others” and “Indices of the bugs each correlated bug is correlated with”, towards the bottom of the file. They indicate which feature pairs are the ground truth.

  • _a_5_d_1_1 indicates, in sequence: a total of 5 pairs (a_5) of bug-bug associations was spiked in; this is the 1st synthetic dataset (d_1) between simulated; this row corresponds to the 1st microbial feature (the trailing 1). You only need to care about the last number as it indicates feature index. The first two numbers shouldn’t be different within a single SparseDOSSA run.

Hope this helps!


Thank you so much for the reply. Now I got the explaination. Only one more doubt. I checked “Indices of bugs correlated with others” and
“Indices of the bugs each correlated bug is correlated with” columns and got their corresponding correlation values. But all these correlations are one to one (one bug to another bug). Can we generate correlation one to many. For example one bug is correlated with two or three other bugs to increase the complexity