Subsetting 16S rRNA Data

Hello Maaslin2 authors/users,

Does anyone have any recommendations on whether we should subset our 16S rRNA amplicon data to the same read depth before running Maaslin2? I noticed by default it runs its own normalization, however, I couldn’t find a clear answer on the usage of this feature for 16S rRNA data.

Thanks, Jacob Nearing

Hi Jacob,

Thanks for the question! Rarefying the data is not standard practice for me in my analysis. Plenty of researchers do use it, so if you wanted to it is not wrong. I really like this manuscript by McMurdie and Holmes on the practice:

How I typically approach sequencing depth in my analysis is to first remove any samples below a given threshold, for 16S I normally choose 5,000 reads. Then I look at the distribution of reads across all of my samples, if that varies a lot and you are concerned with it potentially impacting your analysis you can include sequencing depth as a covariate in your MaAsLin model to correct for these differences. MaAsLin does not incorporate a rarefying step on its own. The two things it employs are a normalization (TSS for MaAsLin2 as default) and transformation of the data (LOG for MaAsLin2).

I hope that helped! Let us know if you have any additional questions.


Hi Kelsey,

Thanks for clearing that up and taking the time to answer my question. That makes a lot of sense and is generally how I go about my own analysis. I was mostly curious to see what others were doing as some tools are built explicitly to deal with data in a compositional aware manner and so rarefying the data is not recommended while other tools highly suggest this practice. It was not clear to me whether this was the case for Maaslin2.

Jacob Nearing