“Big Data” and High-Dimensional Econometric Models
Victor Chernozhukov (MIT) and Christian Hansen (University of Chicago)

June 29-July 3, 2015


Course Description

As in many other fields, economists are increasingly making use of high-dimensional models – models with many unknown parameters that need to be inferred from the data. Such models arise naturally in modern data sets that include rich information for each unit of observation (a type of “big data”) and in nonparametric applications where researchers wish to learn, rather than impose, functional forms. High-dimensional models provide a vehicle for modeling and analyzing complex phenomena and for incorporating rich sources of confounding information into economic models.

Our goal in this course is two-fold. First, we wish to provide an overview and introduction to several modern methods, largely coming from statistics and machine learning, which are useful for exploring high-dimensional data and for building prediction models in high-dimensional settings. Second, we will present recent proposals that adapt high-dimensional methods to the problem of doing valid inference about model parameters and illustrate applications of these proposals for doing inference about economically interesting parameters.


1) Introduction to High-Dimensional Methods and Big Data

  • Regularization
  • Computation

 2) Supervised Learning (Prediction Problems)

  • Cross-Validation
  • Regression

-          Penalized estimation: Ridge, LASSO, Elastic Net, LAVA, adaptive LASSO, SCAD

-          Regression Trees and Random Forests

-          Bagging and Boosting

  • Classification

-          Penalized estimation

-          Classification Trees

-          Support Vector Machines

 3) Unsupervised Learning

  • Principal Components
  • K-means and Hierarchical Clustering

 4) Asymptotic Approximations in High Dimensions

  • Asymptotics under slowly increasing dimension
  • Fixed effects panel data
  • Very high-dimensional asymptotics

-          Oracle inequalities

-          Rate results

-          Moderate deviations

 5) Inference in high-dimensional models

  • Multiple testing: Family-wise error rate and false discovery rate
  • Inference about parameters in very high-dimensional models

-          General framework – Orthogonal Estimating Equations

-          Linear (and Partially Linear) Model Coefficients

-          Instrumental Variables

-          Treatment Effects

Course Description

Summer School 2015

Updated on 2015-04-07T11:35:00+00:00, by Segreteria.