How to create summary tables in Stata
Intro
Welcome to the Stata Guide on creating professional summary statistics tables! "Table 1" is an essential component of almost any research paper: it provides a snapshot of your data (mean, median, standard deviation, count) to help readers understand the dataset’s characteristics before diving into analysis.
We’ll be using the 2012 NYC SAT Scores dataset (originally from NYC OpenData). You can download the dataset directly via this link.
Here is the .do file, which you can use to follow the guide.
Getting Started
Before diving into creating the tables and formatting the statistics, let’s begin by loading the dataset directly from the web and cleaning it up so Stata can read the scores properly. We want to rename each variable for a better understanding manually, then drop the first rows, which contain the headers of the dataset. By default, these scores might load as "text," so we also need to convert them to numbers. For more information on data cleaning strategies, go to link.

My working directory today is the downloads folder. You might also set a working directory to let Stata know where your data and files are. Check this tutorial out if you don’t know what a directory is.
Once the data is loaded, we can verify the variables in the Variables window. We should see our renamed variables: sat_math_score, sat_reading_score, and sat_writing_score.

Basic Commands Exploration
Before making the final table, it is best practice to explore and play around with the data using built-in commands.

The ‘describe‘ command gives you a high-level overview of variable types, while ‘summarize‘ provides the observation count, mean, and standard deviation.

Based on the table generated by the command describe, we see the number of observations and variables of the data set, also the variable name and their storage type - such as string or integers.

Based on the table generated by the command summarize, we understand the dataset more statistically, such as the mean for the SAT reading score is 400, and 413 for the SAT math score is 413, and the mean writing score is the lowest among the three subjects: 393. SAT math (SD: 64) suggests that Math performance varies wildly from school to school.
Variables such as dbn, school name, and number of tests showed 0 observations - this usually doesn't mean the data is missing, but because these variables are stored as strings (text).
While the standard summarize command is great for a quick overview, it can sometimes hide data issues in skewness or data entry errors. By adding the detail option, Stata provides a comprehensive breakdown that includes specific percentiles (1%, 5%, 95%, 99%) and skewness. This is particularly useful for identifying outliers. Thus, we could run
summarize sat_math_score, detail

We can now see critical nuances in the distribution that the simple summary hid. Specifically, in SAT Math scores, the data shows a strong positive skew of 1.81, which explains why the mean (413.4) is notably higher than the median (395); a small number of elite schools are pulling the average up, while the "typical" school actually scores lower
How to use tabstat to format data into a table
The `tabstat` command is useful when you want to see specific statistics that `summarize` does not show by default, such as the median (p50). Let’s run:
tabstat sat_math_score sat_reading_score, statistics(mean sd p50 p25 p75 min max n)


In the table generated by the command tabstat, we could see the displayed statistics and metrics we want to see - Mean, SD, percentiles, min, max and N
Creating and Exporting Tables Using Estout Package
Next, given that we have created some tables using commands like summarize and tabstat before, now, if we want to cross-compare the statistics across three subjects and format the table more professionally or create better quality tables, it is time to use the estout package. This is a community standard for creating tables because it allows for high customization and easy export to other formats. We will conduct two examples, exporting to Word and Excel.
If you haven't used this before, you must install it first by running: (We have already installed it in Getting Started)
ssc install estout
Then, we can use `estpost` to calculate and store the specific statistics we want to display. Here, for example, we are asking for the mean, standard deviation, median, min, max, and count.

The ‘columns’ command controls the orientation (layout) of your final table, essentially deciding what goes across the top versus what goes down the side. In this case, we tell Stata to place the Statistics (Mean, SD, Min, Max) as the column headers across the top, and the Variables (Math Score, Reading Score) as the rows down the side.
After having this taulation generated, if you want to do a write-up using the data from the table, we could use the estout command to help export the table into other formats.
Specifically, the `estout` command here acts as a bridge. It takes the raw numbers stored in Stata's memory and "prints" them into a file on your computer. It also acts as a stylist - it rounds the decimals, adds parentheses, creates titles, and renames the rows so the final table looks professional enough. Now, we use `estout` to export the stored data into a Word-friendly file (.rtf).
- using "Table1_Summary.rtf": tells Stata the name and format of the file you are creating
- fmt(2): format to 2 decimal places (e.g., turns 499.12345 into 499.12)
- par: This stands for Parentheses. It puts the standard deviation inside brackets ( ), which is the standard academic style for Table 1.
We can now see that a file named Table1_Summary.rtf has been automatically created in our working directory. If you do not set the working directory, it should be created in the downloads folder. Opening this file shows a clean, formatted table with our Math, Reading, and Writing scores ready for editing or inserting into an essay.

Here is another example exporting the data in CSV format, which is suitable for opening in Excel.
First, we will use estpost again to collect the data we want to display:

And then we use estout to export it:

- Delimiter: a character (like a comma, a tab, or a semicolon) that acts as a barrier between different pieces of data in a text file. In this case, in a CSV file, the delimiter is a comma.
As mentioned before, the automatically generated CSV file will also be either the working directory or the downloads folder if you have not set a working directory.

Congrats on making it to the end of this Stata guide!
Additional Resources:
Regression in Stata TutorialReclassifying Variables in Stata Tutorial
Using Multiple Frames in Stata Tutorial
Applying Population Weights in Stata Tutorial
By: Thea Pann