How to specify fixed effects in Stata
How to specify fixed effects in Stata
Welcome to the Stata Guide on how to specify fixed effects! The fixed effects method controls for all variables within a regression and heightens the accuracy of your model.
Throughout this guide, blue phrases will be used to indicate commands, green phrases for variables, and purple for links and downloads.
We’ll be using Stata’s built-in longitudinal dataset on women’s employment. We’ll be exploring how various characteristics impact an individual’s wage over time.
Here is the .do file which you can follow along with.
Let’s begin by learning more about when you would want to use fixed effects.
What is the fixed effects method?
Fixed effects is a method for controlling for the time-invariant characteristics that differ among your units of observation that, if not accounted for, might affect your unit of observation that otherwise impact your outcome variable.. The unit of observation will take on different meanings depending on your research question and your data ; this could be a person, firm, city, a school, or other individual units. Fixed effects works well with panel data since this format tracks the same entities over multiple periods of time.
Fixed effects should be utilized when individual characteristics that are measurable or unmeasurable and fixed over time, are expected to be related to both your outcome variable and one or more of your dependent variables. If we tried running a simple linear regression without accounting for these underlying relationships can compromise the quality of our models by introducing bias.
In our case, we are interested in determining how an individual’s wages are impacted by certain quantifiable characteristics (i.e. union status, total work experience, broad location, etc.) over time. While doing this, however, we do not want individual differences which would likely impact wage to be attributed to one of our independent variables. Some examples of such might include temperament, upbringing, social capital, and baseline ability to cope with stress. All of these are rather subjective and difficult to measure through direct variables. While you sometimes might be able to substitute various proxies (i.e. family/parents’ income to account for upbringing), there are still many elements which will remain missing. With fixed effects, each individual unit of observation is being compared to their/its own relative average.
Let’s begin by loading in and taking a look at our dataset of interest.
Throughout this guide, blue phrases will be used to indicate commands, green phrases for variables, and purple for links and downloads.
We’ll be using Stata’s built-in longitudinal dataset on women’s employment. We’ll be exploring how various characteristics impact an individual’s wage over time.
Here is the .do file which you can follow along with.
Let’s begin by learning more about when you would want to use fixed effects.
What is the fixed effects method?
Fixed effects is a method for controlling for the time-invariant characteristics that differ among your units of observation that, if not accounted for, might affect your unit of observation that otherwise impact your outcome variable.. The unit of observation will take on different meanings depending on your research question and your data ; this could be a person, firm, city, a school, or other individual units. Fixed effects works well with panel data since this format tracks the same entities over multiple periods of time.
When should fixed effects be employed?
Fixed effects should be utilized when individual characteristics that are measurable or unmeasurable and fixed over time, are expected to be related to both your outcome variable and one or more of your dependent variables. If we tried running a simple linear regression without accounting for these underlying relationships can compromise the quality of our models by introducing bias.
In our case, we are interested in determining how an individual’s wages are impacted by certain quantifiable characteristics (i.e. union status, total work experience, broad location, etc.) over time. While doing this, however, we do not want individual differences which would likely impact wage to be attributed to one of our independent variables. Some examples of such might include temperament, upbringing, social capital, and baseline ability to cope with stress. All of these are rather subjective and difficult to measure through direct variables. While you sometimes might be able to substitute various proxies (i.e. family/parents’ income to account for upbringing), there are still many elements which will remain missing. With fixed effects, each individual unit of observation is being compared to their/its own relative average.
How is this represented in Stata?
Let’s begin by loading in and taking a look at our dataset of interest.
webuse nlswork
We are using one of Stata’s built-in datasets on women’s employment. From examining the ‘Variables’ tab, we can see that there are a variety of variables which span from demographic identifiers to job-specific information.
From browsing, we can see that the dataset has a panel structure since we are able to track multiple observations across multiple time periods. We need for Stata to know this to accurately implement fixed effects for the correct entities and time period. xtset will specify that the data is in panel format and can be ran as follows:
xtset idcode yearFrom browsing, we can see that the dataset has a panel structure since we are able to track multiple observations across multiple time periods. We need for Stata to know this to accurately implement fixed effects for the correct entities and time period. xtset will specify that the data is in panel format and can be ran as follows:
From the lines displayed by Stata after specifying the data as a panel, we should take a look at the missing years for good measure.
tab year
We can note that the years missing from the year variable are 1974, 1976, 1979, 1981, and 1986. Since there aren’t large chunks of years missing and instead they are just a few singular ones, this will not have a large impact on our analysis. If, for example, there was a five-year gap in the data, our estimate might become biased. Now that we’ve correctly specified the dataset type and learned more about the year variable, let’s turn to formulating our fixed effects regression.
We’ll be running a fixed effects model to determine how total work experience, region, and union and tenure status impact the log of wages. The respective variables for these areas of interest are ttl_exp, south, union, and tenure.
Let’s first run a standard model excluding fixed effects so we can compare the differences later on.
We’ll be running a fixed effects model to determine how total work experience, region, and union and tenure status impact the log of wages. The respective variables for these areas of interest are ttl_exp, south, union, and tenure.
Let’s first run a standard model excluding fixed effects so we can compare the differences later on.
xtreg ln_wage tenure ttl_exp union south
We can gain a general sense of how each of the independent variables impacts wages, but our model will lack specificity. Fixed effects is a key method for gaining a better understanding of how the interplay between different factors will contribute to a given outcome (in this case wage).
Specifying fixed effects in Stata is simple! You only have to add fe to the end of your panel regression (specified xtreg). Let’s run our model as follows:
Specifying fixed effects in Stata is simple! You only have to add fe to the end of your panel regression (specified xtreg). Let’s run our model as follows:
We can examine the additional information added to our regression model by specifying fixed effects. The R squared section of the output table highlights relevant differences between the inclusion and exclusion of fixed effects specifications in regression modelling. The ‘Within R-Squared’ can be interpreted as saying that around 14.4% of the variation in log-wages within individuals over time is explained by the independent variables. This is the R-squared that we will refer to when looking at our fixed effects model. The ‘Between R-Squared’ attributes 29.3% of the variation between individuals to be explained by the model while the overall gives an average of these two.
As compared to the model which didn’t include fixed effects, each of the independent variables has been attributed a smaller amount of the corresponding change in wages. This is because fixed effects has determined some of this variation to be a result of individual differences instead of the dependent variables.
Congrats on successfully completing this guide on specifying fixed effects in Stata!
By: Zoe Pyne