I am currently analyzing 16S rRNA amplicon sequencing data (ASVs). Before running MaAsLin2, I have already rarefied my data to an even depth of 10,000 reads per sample.
My question is regarding the normalization parameter for this specific input:
Since the read counts are already uniform across all samples due to rarefaction, should I set normalization = "NONE" to avoid double-normalization? Or is it still recommended to keep the default normalization = "TSS"?
I have read a few related posts on the forum regarding normalization like below, but I couldn’t make a definitive judgment on which approach is mathematically or practically more appropriate for rarefied data in MaAsLin2.
Could you please clarify if one is preferred over the other, and if using “TSS” on already-rarefied data affects the statistics or just the interpretation of the estimates?
Thank you in advance for your time and assistance!
If you’ve already rarefied to even depth, setting normalization to NONE vs. TSS should yield the exact same coefficients (assuming you’re using LOG transform) since you’re just scaling multiplicatively by 10000 which on the log scale is an additive shift and therefore entirely factored into the intercept rather than the per-metadatum coefficients. Still, it’s probably more straightforward to use TSS normalization so you can think of all the inputs as fractions of 1 (and e.g. apply any prevalence/abundance filters with that in mind).