Interactions in R formulas

R
Author

Richard J. Telford

Published

February 22, 2024

I happened across a tweet yesterday criticizing a paper

Now, while I don’t have a clue what a DiD model is, I do know how to code a interaction in R and that last model most definitely has a interaction.

From the interactions with my response, it seems that many people don’t know all the ways an interaction can be specified in regression models in R.

There are three ways include an interaction in a formula.

library(palmerpenguins)

# interaction specified explicitly with :
lm(body_mass_g ~ sex + year + sex:year, data = penguins)

Call:
lm(formula = body_mass_g ~ sex + year + sex:year, data = penguins)

Coefficients:
 (Intercept)       sexmale          year  sexmale:year  
  -47659.868     15834.832        25.658        -7.545  
# shortcut with * that expands to give the same model
lm(body_mass_g ~ sex * year, data = penguins)

Call:
lm(formula = body_mass_g ~ sex * year, data = penguins)

Coefficients:
 (Intercept)       sexmale          year  sexmale:year  
  -47659.868     15834.832        25.658        -7.545  
# shortcut with ^2 that expands to give the same model
lm(body_mass_g ~ (sex + year)^2, data = penguins)

Call:
lm(formula = body_mass_g ~ (sex + year)^2, data = penguins)

Coefficients:
 (Intercept)       sexmale          year  sexmale:year  
  -47659.868     15834.832        25.658        -7.545  

If you compare the coefficients from the three models, you will see that they are identical.

Since all three ways of coding the interaction give exactly the same result, surely one or two of these forms are redundant.

Well no. Consider a slightly more complex model where we add an extra prediction and its two - way interactions with the other variables.

# With :
lm(body_mass_g ~ sex + year + island + sex:year + sex:island + year:island, data = penguins)

Call:
lm(formula = body_mass_g ~ sex + year + island + sex:year + sex:island + 
    year:island, data = penguins)

Coefficients:
            (Intercept)                  sexmale                     year  
              -67269.07                 33226.17                    35.65  
            islandDream          islandTorgersen             sexmale:year  
               56299.88                378330.62                   -16.16  
    sexmale:islandDream  sexmale:islandTorgersen         year:islandDream  
                -245.98                  -140.30                   -28.47  
   year:islandTorgersen  
                -188.87  

Use the explicit style if you really like typing. With more than a few variables and their interactions this becomes unmanageable.

# With *
lm(body_mass_g ~ sex * year * island, data = penguins)

Call:
lm(formula = body_mass_g ~ sex * year * island, data = penguins)

Coefficients:
                 (Intercept)                       sexmale  
                   -70977.42                      40494.12  
                        year                   islandDream  
                       37.50                      73282.82  
             islandTorgersen                  sexmale:year  
                   356748.26                        -19.77  
         sexmale:islandDream       sexmale:islandTorgersen  
                   -33772.66                      44938.25  
            year:islandDream          year:islandTorgersen  
                      -36.93                       -178.12  
    sexmale:year:islandDream  sexmale:year:islandTorgersen  
                       16.70                        -22.45  

Whoops, that went wrong. It included the three-way interaction as well. We need to remove those by subtracting it in the formula.

# With *, two way only
lm(body_mass_g ~ sex * year * island - sex:year:island, data = penguins)

Call:
lm(formula = body_mass_g ~ sex * year * island - sex:year:island, 
    data = penguins)

Coefficients:
            (Intercept)                  sexmale                     year  
              -67269.07                 33226.17                    35.65  
            islandDream          islandTorgersen             sexmale:year  
               56299.88                378330.62                   -16.16  
    sexmale:islandDream  sexmale:islandTorgersen         year:islandDream  
                -245.98                  -140.30                   -28.47  
   year:islandTorgersen  
                -188.87  

That’s better. If we had many variables with their interactions, subtracting all the four, five or six way interactions would be really tedious.

Fortunately the last style just works.

lm(body_mass_g ~ (sex + year + island)^2, data = penguins)

Call:
lm(formula = body_mass_g ~ (sex + year + island)^2, data = penguins)

Coefficients:
            (Intercept)                  sexmale                     year  
              -67269.07                 33226.17                    35.65  
            islandDream          islandTorgersen             sexmale:year  
               56299.88                378330.62                   -16.16  
    sexmale:islandDream  sexmale:islandTorgersen         year:islandDream  
                -245.98                  -140.30                   -28.47  
   year:islandTorgersen  
                -188.87  

Concise and it gives us just what we want. If we did want the three way interactions, just replace ^2 with ^3.

lm(body_mass_g ~ (sex + year + island)^3, data = penguins)

Call:
lm(formula = body_mass_g ~ (sex + year + island)^3, data = penguins)

Coefficients:
                 (Intercept)                       sexmale  
                   -70977.42                      40494.12  
                        year                   islandDream  
                       37.50                      73282.82  
             islandTorgersen                  sexmale:year  
                   356748.26                        -19.77  
         sexmale:islandDream       sexmale:islandTorgersen  
                   -33772.66                      44938.25  
            year:islandDream          year:islandTorgersen  
                      -36.93                       -178.12  
    sexmale:year:islandDream  sexmale:year:islandTorgersen  
                       16.70                        -22.45  

This style is particularly useful if we want to include all available variables with the . shortcut. This might be the first step of a backwards selection.

lm(body_mass_g ~ .^2, data = penguins)

Call:
lm(formula = body_mass_g ~ .^2, data = penguins)

Coefficients:
                       (Intercept)                    speciesChinstrap  
                        -8.482e+04                           2.520e+04  
                     speciesGentoo                         islandDream  
                        -2.695e+05                          -1.123e+04  
                   islandTorgersen                      bill_length_mm  
                         4.362e+05                           5.100e+02  
                     bill_depth_mm                   flipper_length_mm  
                        -4.756e+04                           4.782e+03  
                           sexmale                                year  
                         2.192e+05                           3.374e+01  
      speciesChinstrap:islandDream           speciesGentoo:islandDream  
                                NA                                  NA  
  speciesChinstrap:islandTorgersen       speciesGentoo:islandTorgersen  
                                NA                                  NA  
   speciesChinstrap:bill_length_mm        speciesGentoo:bill_length_mm  
                         1.814e+01                           2.715e+01  
    speciesChinstrap:bill_depth_mm         speciesGentoo:bill_depth_mm  
                         5.110e+01                           7.262e+00  
speciesChinstrap:flipper_length_mm     speciesGentoo:flipper_length_mm  
                         3.246e+01                           4.318e+00  
          speciesChinstrap:sexmale               speciesGentoo:sexmale  
                        -9.754e+02                          -8.683e+02  
             speciesChinstrap:year                  speciesGentoo:year  
                        -1.640e+01                           1.337e+02  
        islandDream:bill_length_mm      islandTorgersen:bill_length_mm  
                        -2.722e+01                          -3.077e+01  
         islandDream:bill_depth_mm       islandTorgersen:bill_depth_mm  
                        -5.704e+01                          -1.662e+02  
     islandDream:flipper_length_mm   islandTorgersen:flipper_length_mm  
                        -1.438e+01                          -8.442e+00  
               islandDream:sexmale             islandTorgersen:sexmale  
                         2.136e+02                           2.736e+02  
                  islandDream:year                islandTorgersen:year  
                         7.933e+00                          -2.144e+02  
      bill_length_mm:bill_depth_mm    bill_length_mm:flipper_length_mm  
                        -2.581e+00                          -1.642e+00  
            bill_length_mm:sexmale                 bill_length_mm:year  
                         2.904e+01                          -6.497e-02  
   bill_depth_mm:flipper_length_mm               bill_depth_mm:sexmale  
                        -2.579e-02                          -7.168e+01  
                bill_depth_mm:year           flipper_length_mm:sexmale  
                         2.381e+01                           1.684e+01  
            flipper_length_mm:year                        sexmale:year  
                        -2.341e+00                          -1.105e+02  

I think all the styles of specifying interactions are useful. If you don’t, blame Wilkerson and Rogers (1973).