Introduction to data management and analysis using Stata

Most quantitative methods courses teach students how to analyze datasets that are ready for analysis. In real world, creating analysis datasets is often more time consuming and challenging than conducting analyses. The purpose of this workshop is to introduce participants to the core data management skills necessary for creating analysis datasets. These skills will help researchers save time while increasing data quality. Our data management workshop uses Stata, which offers an excellent combination of data manipulation capabilities, user friendliness, and cutting-edge analytic techniques.

Expected outcomes of our data management using Stata workshop

By the end of the workshop participants will be able to:

  • Create Stata datasets from Excel spreadsheets or data stored in other formats
  • Investigate data structure, identify errors in data, fix data errors, and confirm that variables have been created correctly
  • Create analysis datasets that merge data from multiple sources
  • Create longitudinal datasets that append data from multiple time periods
  • Create variables that require calculations across observations (e.g., create measure of student GPA from a dataset that has one observation for each course the student took)
  • Reshape the structure of analyses datasets (e.g., convert a dataset that has one row per person and one column for each year to a dataset that has one row for each person-year)
  • Increase efficiency and reproducibility of results conducting all steps of data analysis from within Stata do-files (reading in data; investigating/cleaning data; creating analysis variables; running analyses; and presenting results)
  • Increase productivity by learning how to automate iterative tasks (e.g., reading in multiple years of data; running analyses on different subgroups) rather than writing separate commands for each task

Who should attend?

This workshop is intended for participants who have modest experience using Stata (e.g., a one-semester statistics class) and who have used Stata for basic data manipulation (e.g., creating variables, dropping specific observations), but do not have experience using Stata for complex data manipulation (e.g., performing calculations across observations, reshaping the structure of datasets). The two specific course prerequisites are (1) experience working with Stata do-files and (2) experience changing directories (i.e., the “cd” command) within Stata do-files.

  1. Overview of data management skills you should master and where to find additional help on data management
  2. Best practices for writing Stata do-files
  3. Reading data into Stata
  4. Investigating data patterns and cleaning data
  5. Combining datasets (merging and appending)
  6. Performing calculations across observations and changing the structure of your data
  7. Introduction to macros, looping, and user-defined programs

About the instructor

Ozan Jaquette is an assistant professor of higher education at the University of Arizona. His research investigates change over time in the enrollment management behavior of colleges and universities. His work has appeared in top education journals, including the American Educational Research Journal, the Journal of Higher Education, and Research in Higher Education. He regularly teaches courses in higher education finance and applied statistics with a focus on program evaluation. He is a member of the Integrated Postsecondary Education Data System (IPEDS) Technical Review Panel and he has published a chapter on creating analysis datasets from IPEDS to conduct longitudinal analyses of organizational behavior.

