Inteeractions analysis question

Dear sir/madam,

When we use:

df_input_metadata$CD_dysbiosis = (df_input_metadata$diagnosis_modified == "CD") *
                                 df_input_metadata$dysbiosis

The values for other entries except CD will be 0 which I think it is not fair. It should be “NA”. Am I right?

Thanks.

Hi -

It should be zero. From a statistical point of view, this model is Y ~ beta_0 + beta_1 * CD + beta_2 * dysbiosis + beta_3 * (CD * dysbiosis). If we set the variable CD * dysbiosis to zero when not CD, beta_3 will exactly be the estimate for the effect of dysbiosis interacting with CD. Setting it to NA instead will only cause R to treat the variable as missing values.

Best,
Siyuan