What does "spikein.mt" do? Hoping to be able to provide specific OTUs to correlate


I’m looking for functionality that allows me to tell the methods which OTUs to correlate metadata with. I looked into the code and I’m unsure what “spikein.mt” does but it suspiciously/optimistically looks like it might be some object for telling the downstream code to use specific features for the microbe-metadata correlation? Confirmation and any further resource on the input parameter descriptions to sparseDOSSA() would help a lot.

This param: https://github.com/biobakery/sparseDOSSA/blob/master/R/synthetic_datasets_script.R#L31

Used here: https://github.com/biobakery/sparseDOSSA/blob/7fe45761bd7a57659718687a33f875535322bfb9/R/synthetic_datasets_script_helper_functions.R#L853

And used here: https://github.com/biobakery/sparseDOSSA/blob/7fe45761bd7a57659718687a33f875535322bfb9/R/synthetic_datasets_script_helper_functions.R#L1568


Hi -

You were right. That parameter enables exact specification of correlations with certain OTUs. Using the following example:

  feature metadata strength
1      13        5       10
2      14      1;4   -1;-10
3      20        3      -10
4      21        1       -1
5      24        4      -10
6      34        2       -1

That is, spikein.mt should be a three column data.frame. feature is index of OTU to be associated. metadata is metadata indices to be associated with the OTU, separated by “;”. strength is the effect size corresponding to each metadata column, also separated by “;”. Note that this parameter only makes sense when you provide your own metadata through UserMetadata. It’ll also bypass parameters such as percent_spiked and spikeStrength.


Thanks for this response. I read the code and discovered more about the functionality. Wished it was documented as it looks like it hasn’t been documented yet!

I am curious about one additional thing, if you are able to answer it. It’s related to microbe-microbe correlation. Is it possible to specify which microbes should be correlated? Or are selected at random?

They are selected at random, and we don’t have a spikein.mt equivalence to specify features at this point. This is a bit hack-y, but you can specify pairs of features to be associated with the same metadata variable, also through spikein.mt. In fact, internally this is how in principle SparseDOSSA specifies correlation between feature pairs.