Adjusting for covariates in HAllA

Dear all

I would like to perform HAllA on a two datasets (both of which quantitative variables). However, I also have qualitative covariates that I would like to adjust for prior to HAllA. I read the manuscript for Lloyd-Price et al. 2019 and some other manuscripts out there, and it says that the dataset needs to first be “residualized” based on a linear mixed effects model from the differential abundance analysis, and the HAllA tutorial mentions that any covariates need to be regressed out using lmer command from the lme4 package.

My question is how do we take the information from the lmer output (e.g. residuals) and change the -X and -Y inputs for HAllA? I am not entirely sure what pieces of information from the lmer output we take to apply it to the halla inputs. Specifically, the Lloyd-Price manuscript mentioned " Pearson’s residual values from the above linear mixed effects models were retained for use in [HAllA]," but how does this value get incorporated into my quantitative measurements for my -X and -Y datasets?

I am relatively new to HAllA analysis so I would appreciate any help about this. Thanks so much!



Hi Marc. You’re not so much “changing” the inputs as you are completely replacing them with the residuals. So each row in the new X and Y inputs would be the residuals of an lmer model for that feature in the original data. The purpose of the lmer model is to remove the effect of the covariates you want to account for on each feature. You can access the residuals of a lmer model using the residuals() function in R after you’ve loaded the lmer library.

Thanks Andrew for the response.

So if I have a -X dataset (with 100 taxonomic features as rows and samples as columns, for example), and a -Y dataset (100 metabolomic features as rows, for example), then I’ll need to run lmer for all my features, to account for how my fixed and random effects affect each of the -X and -Y row features (so total 100+100=200 lmer model runs)?


Hi Marc, yes, that’s correct.

For the taxonomic features you might also consider using adjust_batch() from MMUPHin.

Hi Andrew. Thanks! That’s very helpful.

I had a quick look at the MMuphin preprint on biorxiv, and it says that “[it acts] as a framework for meta-analysis of microbial community studies using taxonomic, functional, or other abundance profiles,” suggesting that functional data (e.g. in relative abundance values from humann3 for example) can also be treated as input by MMUPHin. I have some follow up questions about this:

  1. So I can use both my taxonomic and functional profiles for MMUPHin for batch and covariate adjustments right?

  2. I presume I do not need to calculate for residuals following MMUPHin-adjusted taxonomic/function profiles (should I choose to use it)?

  3. If I adjusted some datasets (taxonomic/functional) using one method (e.g. MMUPHin), whereas other datasets (metabolomic or other quantitative metadata measurements for example) I adjusted using the residuals method, is the mixing of adjustment methods a concern? I personally feel that if I only have taxonomic and functional data, they can both be adjusted using MMUPHin without much issue. However, I am currently dealing with other datasets that I would like to run pairwise HAllA with, so adjust these datasets individually using different methods might cause a problem? I am not sure if this concern is valid.

Thank you so much for your help!



  1. MMUPHin was designed to adjust taxonomic abundances. My instinct is that it wouldn’t be appropriate for the functional profiles, but I’m not sure. You might try both and inspect where they differ.
  2. Correct.
  3. Adjusting one with MMUPHin and the other with lmer would be fine. The point of the adjustment is simply to remove the effect of covariates you think might confound the associations you’re asking HAllA to find. Whatever method does this best in your judgement should be what you use. There’s no downside to adjusting X with one method and Y with another outside of adding some additional details to your analysis protocol.

Thanks for this!

I noticed that over at the MMUPHin forum next door someone else mentioned that the tool is good for comparing over multiple studies, or multiple batches within a single study. In my case, I do not have multiple batches, and I am not sure if “batch” is a required input in the command.

Anyways, this hbas become more of a thread about MMPUHin now, sorry about that!



Not a problem, I’ve been working on the MMUPHin documentation recently which is why it was on my mind to suggest it in the first place.

Ah, I may have been making some undue assumptions about the nature of the covariates you’re trying to adjust for. Namely that your covariates were disparate grouping factors like separate studies. Yes, MMUPHin::adjust_batch() is primarily aimed at accounting for inter-study differences for meta-analysis, but it does have an additional covariates argument that lets you layer in additional covariates on top of that.

Side note: the term “batch effects” can sometimes be used to refer to study effects. batch is a required argument in adjust_batch(), but it doesn’t necessarily mean literal separate experimental batches within a single study.

By the way, I conferred with some others in the group about using adjust_batch with functional profiles. Turns out is has been done successfully before, but you need to make sure that you’re using unstratified, compositional functional profiles.

Thank you for this Andrew.

Just final (silly) question: after adjusting for residuals using lmer I noticed some of my residual values are negative (which is not surprising). I suppose HAllA will be able to handle negative values…



Correct, negative values are no problem.