Best way to choose the reference

Hello, I am just wondering which is the best way to choose the reference?
In my case I am looking for the association between fecal microbiota and ultraprocessed food consumption in a large clinical trial. I am adjusting for some categorical variables such us “recruiting_center”. In this case which one I have to choose? perhaps the one which includes more samples? and what about “smoking_habits”? should I have to choose “never_smoked” because the majority of patients are represented in that category?

Hi @Alex_A,

This is a great question and as you point out is definitely not always straightforward. When don’t have a variable that you 100% know you want to compare your levels too, I think it is fair to use the one with the largest sample size (as you state in the question). For something like smoking, I always try to use the one that represents no exposure (never_smoked), as I would be potentially interested in changes after exposure to something - but again you could set it as the largest one if you more using this variable to adjust the overall model.

I hope this helps!

1 Like