The bioBakery help forum

Metatranscriptomic data with MaAsLin2 & LEfSe

Hi all,

I’m currently running some metatranscriptomic data (without accompanying metagenomic data) through Humann2.
The study design is quite simple, disease state vs. healthy controls, and I would like to find out what taxons are differentially abundant (I understand that abundance is to be taken with a grain of salt in this case due to the nature of single copy marker genes), which gene families/pathways are differentially expressed, and look at any associations between taxonomy/function and clinical metadata.
My question is whether MaAsLin2 and LEfSe are suitable for statistical analysis on the Humann2 functional outputs for this type of data? I would normally use DESeq2 for RNA-Seq data if I was aligning to reference genome(s) but I do not think it would be suitable here as I will not have raw count data.

Any advice would be greatly appreciated,
All the best,

You could definitely use MaAsLin 2 for this. My suggestion would be to sum over genes within species from the HUMAnN gene families output to get an estimate of each species’ total RNA contributions. (Using MetaPhlAn to detect species from RNA in the first step of HUMAnN should still be fine, but the total gene RNA coverage will be a better estimate of RNA abundance than an average over marker RNA coverage, as MetaPhlAn reports.)

You could then sum-normalize those RNA abundances and directly analyze them to look for changes in species-level structure (noting that the changes will reflect a combination of species abundance and total expression).

If you analyze individual, stratified molecular functions, I would use the corresponding species’ total RNA as a covariate in your model. E.g. to compare gene1|species1 RNA values with a covariate of interest, include species1 total RNA abundance (as computed above) as an additional covariate. This will help to distinguish changes in expression of gene1|species1 specifically from changes in species1 abundance (which will affect all of its genes). In the absence of paired metagenomic data, we’ve found this approach to be much less prone to false positives than the alternative (i.e. not using the species covariate).