#eye #eye

How to use interaction terms in Stata


Welcome to the Stata Guide on how to specify interaction terms! Interaction terms allow you to model the combined effect of two or more variables onto a dependent variable. Learning how to accurately use this concept and specify it under a regression model will open up a wider range of possibilities for analysis.

Throughout this guide, blue phrases will be used to indicate commands, green phrases for variables, and purple for links and downloads. 

We’ll be using Stata’s built-in longitudinal dataset on women’s employment.


Here is the .do file which you can follow along with.


Let’s begin by learning more about when you would want to specify an interaction term.
 

What is an interaction term and when should it be used?


Including an interaction term in a regression allows you to specify that the value of one variable might heavily depend on another variable. When looking at our longitudinal dataset on women’s employment, we can narrow down a variety of variables which might interact with one another. ttl_exp and collgrad might have significant influences on one another since total work experience is likely to differ as a result of whether or not someone graduated from college. south and race taken together can offer some insights into how region impacts gaps in pay with relation to racial identifiers. These are only a few examples of some of the combinations which might create interesting interaction terms within our regression model.

How are interaction terms represented in Stata?


There are two means through which an interaction term can be specified in Stata, both which employ the use of the pound sign (#).

The first, is through the use of one pound sign which tells Stata that you are interested in estimating solely the interaction term without the effects of each individual variable. In other words, we will only see combinations of the two variables and coefficients for each instead of seeing the impacts of individual variables on their own. Let’s begin by running a model which will show us the the coefficient of south and race taken together as an interaction:

xtreg ln_wage south#race


Looking at the regression’s output table, we see that five interactions have been made from the two variables specified earlier.

  • White#1: compares white workers in the south to white non-Southern workers and has found that those in the south have wages around 5.6% lower.
  • Black#0: compares Black non-Southern workers white non-Southern workers and finds a 1.4% wage gap with Black workers making less.
  • Black#1: compares Black Southern workers and white non-Southern workers finding a 23.9% gap in their wages


** The two other categories, Other#0 and Other#1, are comparisons amongst workers who were classified as other races where the first are non-Southern and the second are Southern. Non-southern workers of other races earn about 9% more than White non-Southerners while in the south this is a gap of about 1.1%. Given the lack of specificity within the term ‘Other’ within the race variable, there is very little to be determined from these final two coefficients.

What if we are interested in parsing the individual effects of each variable on its own as well as would be given to us in a non-interaction term including regression? We can now try running the same model above but instead with the ‘#’ replaced with ‘##’.

xtreg ln_wage south##race





We can see that the format of our regression output table has shifted and that the reference group is those who are White and non-Southern. Some key points that should be noted from the table are that:

  • Black non-Southern workers overall earn 1.3% less than White non-Southern workers
  • White Southern workers earn 5.6% less than White non-Southern workers
  • There’s an additional wage penalty for Black Southern workers since they earn 17% lower than White Southerners.
  • Other’ is too broad and non-specific a category to attribute meaning to within this regression.

    Congrats on successfully completing this guide on using interaction terms in Stata!