Hello, I’m studying 16S data from 20 CRC mice in R and I plan to use adjust_batch() from MMUPHin to correct for technical bias. To handle the sparsity of my matrix (factually, few genera and samples available), I first applied cmultRepl() (GBM) to impute zeros, which returns proportions. Then, I applied adjust_batch() with zero_inflation = FALSE, as follows:
tmp ← cmultRepl(abd_gene_level,z.warning = 1, z.delete = FALSE)
fit_adjust_batch ← adjust_batch(feature_abd = t(tmp),
batch = “expr_run”,
covariates = “condition”,
data = metad,
control = list(zero_inflation = F,
diagnostic_plot = “mmuphin_diag.pdf”, verbose = TRUE))
Then I plan to apply a CLR transformation for downstream analyses.
My questions are:
-
Is it a valid approach to impute zeros first with cmultRepl and then apply
adjust_batch()withzero_inflation = FALSE, or is it necessary to let the function handle zero-inflation internally to obtain proper batch correction estimates? -
Can this same workflow (cmultRepl →
adjust_batch()→ CLR) be applied to functional abundance matrices, such as KO predictions from PICRUSt2, to correct for batch effects between sequencing runs?
Thank you in advance for your guidance.