Total RNA contribution for Metatranscriptomics Analysis with MaAsLin3

Hello,

I am learning to do Metatranscriptomics analysis without paired MGX. I read the paper on comparing DE methods (Statistical approaches for differential expression analysis in metatranscriptomics - PMC) and found it super helpful in terms of the steps necessary for a robust statistical comparison - seems like M3 is the way to go for my analysis. I ran my samples through HUMANn4 to get my gene family outputs (KO, GO, and Level4ECs). However, I’m not entirely sure how to get each

…species’ total RNA contributions as a model covariate.

from the HUMANn output. Would it just be # features from species A / # all features?

I read through the MTX model 3 tutorial on github but it uses DNA copy number / abundances to adjust the data - no mention on MTX data alone without MGX.. other than running MaAsLin3 on raw RNA abundances (which I don’t think is the same as M3 from the paper?).

Thanks!

Hi,

Model 3 means that you need both the metatranscriptomics scaled within each species (which you have) as well as a way of estimating the taxonomic abundance from the RNA (via a uniformly expressed housekeeper gene or something). I’m not sure if you have the second based on what you’ve described. In MaAsLin 3, you would then use the feature_specific_covariate field with the taxa table from maaslin3::preprocess_taxa_mtx - let me know if the examples in ?preprocess_taxa_mtx aren’t informative enough.

Will

Thanks for your reply Will. My pipeline was first QC (using Kneaddata) then metaphlan4 then HUMANn4 (which I used the output of metaphlan for). As such, my taxonomic abundance would’ve been estimated by metaphlan? If I don’t have a uniformly expressed housekeeper gene spiked in, is there any way to use model 3 still?

Sorry, is that to say you put the RNA sequencing through MetaPhlAn?

Correct. I ran metaphlan4 separately because if I remember correctly, HUMANn4 invokes an outdated version of metaphlan/chocophlan. So instead, I ran metaphlan4 separately and then passed those output files into humann4 using the --taxonomic_profiles flag.

Hmm, I’m not quite sure what to make of MPA4 on RNA sequencing - the normalization steps of MPA4 that allow you to go from read alignments to average coverage make pretty strong assumptions about any location on the genome being equally likely to be sequenced. This is quite different from RNA sequencing where you’d pick up really heavy “coverage” of highly expressed genes. If you can identify some housekeeper genes that are similarly expressed across most taxa but have enough sequencing variation to distinguish the taxa, I think that’s your best bet for estimating taxonomic abundance from the RNA sequencing data.