I’d like to run a model you ran in the MaAsLin2 preprint but I have a few questions about arranging the data and metadata files to replicate it.
I have longitudinal, paired metatranscriptome and metagenome data from individual patients . For each RNA and DNA sample I determined the read counts to the gene level and normalized them by TPM per sample. I’ve also grouped/stratified these TPM values per sample by functional pathway/taxonomy. Given that, I essentially would like to replicate this model described in the preprint:
Where “the log-transformed relative abundances of the whole-community and species-stratified RNA pathways are…modeled…while additionally adjusting the corresponding DNA pathways abundance as a continuous covariate to filter out the influence from gene copies”
I can’t figure out how to set up the input data and metadata files to accomplish this. Specifically, it’s unclear to me how both the RNA and DNA abundance values are incorporated into these files (the tutorial data set only uses one abundance value per feature) and then appropriate way to reference them in the metadata for MaAsLin2 to use RNA as the “intercept”? Would you able to share or describe a template data/metadata files for this kind of RNA and DNA covariant analysis?
You say when the above model is run on data grouped to the pathway level and/or stratified by taxonomy “this considers a per-feature DNA covariate model, in which per-feature normalized transcript abundance is treated as a dependent variable, regressed on per-feature normalized DNA abundances along with other regressors in the model”. Is “per-feature normalized abundance” the same thing as just using TPM values (calculated per sample individually based on the raw feature read counts in each sample, and summed/stratified appropriately to pathway/taxonomy levels per sample) as the feature abundance values?
I noticed that the result of the above model were not included in the preprint. Although Figure 5 shows the results of running MaAsLin2 using either RNA abundance or DNA abundances - I couldn’t find the results of the model using RNA abundances with DNA abundance as a covariate. Should I read into that as running RNA and DNA separately and then comparing the results is more appropriate than using DNA as a covariate for transcription levels?
Thank you for any guidance you can provide!