Here you can find simple guides to different techniques for statistical analysis with the software Stata. The focus is on running and interpreting the analyses, not the theory and assumptions behind that underpin the analyses. In the guides the code and output from the statistics software is shown, together with explanations in text. All code is supposed to be reproducible, so if you want to you can download the data that is linked in the guides, and follow along in the instructions. The site is run by Anders Sundell. Swedish version.
Follow me on Twitter or Youtube for examples of data visualizations.
Different operations required to prepare the data for analysis.Getting started with Stata
The different parts of the program, setting a project folder, loading data, do-files, etc.Create datasets and import data
Import data or create a dataset from scratch.Recode variables
Change or remove certain values from variables to prepare them for analysis, using the commands "recode", "generate" and "replace".Center, standardize and normalize variables
Three common transformations of variables: centering, standardizing and normalizing.Create an index variable
Create index variables that combine values of several variables, and check the reliability using Cronbach's alpha.If qualifiers and conditions
Use conditions to run analyses and other commands on selected groups of observations.Combining datasets
Add data from other sources with the command "merge".Logarithms
Use the logarithmic transformation on variables to account for skewness, for instance arising from exponential growth.Aggregate datasets
Use the command "collapse" to aggregate datasets to show statistics such as means and standard deviations for groups in the data.
Regression analysis
A common tool for statistical analysis. Used to investigate relationships between two or more variables.Introduction
Begin here. The basic principles, with two variables.Interpret the results
What the different parts of Stata's output from regression analysis means, with annotated output.Control variables
Add control variables to account for alternative explanations.Predict values
Use the regression equation to predict values - guesses - for observations in the data.Dummy variables
Use dummy variables to include categorical variables in the analysis.Logarithmic variables
Run and interpret analyses with logarithmic variables, for instance to account for diminishing effects.Logistic regression
A special regression analysis suited for dependent variables that only have two values, 0 or 1. How to run and interpret the analysis, and how it differs from OLS.Interaction effects - two values
Effects that vary over two groups in the sample.Interaction effects - continuous variables
Effects that vary over continuous variables.Tables for presenting results from regression analyses
Create nice tables for presenting regression results with the command esttab.
Descriptive statistics and simpler analyses
Get an overview of the data before proceeding to more advanced analysis.Simple descriptive statistics
Use the commands codebook, summarize and tab to quickly find out the mean, median, min and max values (among other things) for a variable.Mean values (averages) in different groups
Compare groups in a straightforward way by comparing mean values in different groups, using the commands sum and table.t-test
Test differences between groups for statistical significance.Correlation
Simple and very common measure to show the strength and direction of association between two variables.Crosstabs
Relationships between two categorical variables shown with percentages.
Graphs
Various techniques for visualizing data and relationships.Histograms
Show the distribution of a variable with bars of different heights.Bar charts
Averages for different groups shown with bars.Scatterplots
Show relationships between two variables with points.Line graphs
Show how a variable has changed over time with line graphs.Maps of the world and regions with spmap
Maps that show countries' values on different variables with colors.Visualize regression coefficient with coefplot
Present regression coefficients and confidence intervals graphically, using the command coefplot.
Time series and panel data
Work with time in Stata, either for one unit (time series) or many (panel data).Setting up data for time series
Set time variable, lags, leads, delta variable, plot data over time.Setting up panel data
Set panel and time variable, the difference between wide and long data, common error messages.Transform panel data between long and wide with reshape
How to transform data between the two data formats for panel data, wide format and long format, using the command reshape.Panel regression with fixed effects
How to use and understand so called "fixed effects" in regression analysis of panel data.