MTX Model IBD Data Processing Pipeline

Hey folks, I am currently employing the MTX Model to identify upregulated and downregulated genes within IBD metatranscriptomic data, sourced from the Inflammatory Bowel Disease Multiomics Database. As I am relatively new to this area of study, I am seeking feedback on my approach to ensure its correctness, and to ascertain whether there are any preprocessing errors that may have escaped my notice.

So I first take the relative abundance MTX and MGX datasets, along with their corresponding patient metadata and identify the upregulated and downregulated genes in the following way:

input_data ← ‘mtx_data_RELAB.tsv’
input_metadata <-‘metadata.tsv’
input_dnadata ← ‘mgx_data_RELAB.tsv’
fit_data ← MTXmodel(
input_data, input_metadata, ‘RESULTS’, transform = “LOG”,
fixed_effects = c(‘IBD_status’),
random_effects = c(‘site’, ‘subject’),
reference = “IBD_status,nonIBD”,
normalization = ‘NONE’,
standardize = FALSE,
input_dnadata = input_dnadata
)

Could you please advise whether the relative abundance data from the MTX and MGX datasets requires additional preprocessing prior to its application in the MTX Model? I would greatly appreciate any advice or corrections you might have regarding my methodology.

Thank you for your assistance.

-Rodrigo

Hi Rodrigo,

MTXmodel will match the feature IDs from MTX to those in MGX when modeling. So you may want to double check if the feature IDs are matched between MTX and MGX. There are some optional settings you may want to customize as well:

  1. Add more covariate using setting ‘--fixed_effects’ when needed
  2. Filter low-abundance or low-prevalence features by setting ‘--min_abundance’ or ‘--min_prevalence

You can see more details about these in our tutorial:

Thanks!
Yancong