Considerations on Partially Identified Regression Models

ZEW Discussion Paper No. 12-024 // 2012

Considerations on Partially Identified Regression Models

In a linear regression model, specifying the conditional mean of a variable (outcome y) as a linear function of a set of explanatory variables (regressors x ), E(y|x) = xθ, absence of correlation between the regressors is typically a necessary and sufficient condition for point identification of the vector of parameters θ, for a wide range of conditional distributions of y given x. This means that with sufficiently many observations at hand, a researcher will be able to pinpoint the true value of the parameters.

However, if y or x is imperfectly measured, for instance if, for some variable, only interval measurement is available (which is often the case for variables such as age, income, or schooling), then typically point identification will be lost: whatever the number of observations available, there will be a set of values of the parameters that are compatible with the observations. This set is the identified set, and the parameter (and also the model) is said to be partially identified. The task of the researcher is then to learn from a set of observations what the identified set may be.

The literature on partially identified models has developed considerably over the last ten years and it has many applications in labor economics and more recently in the empirical analysis of game situations in industrial economics. Most of it is extremely technical, and the details needed to apply the estimation methods in practice are often missing. Indeed, this paper originated in an attempt to replicate the illustrative simulation results of the seminal contribution of Manski and Tamer (2002) who study identification regions for parameters in regressions with interval data on a regressor or the outcome.

While their focus was on illustrating the general approach, we concentrate instead on the derivation of exact results for their two Monte Carlo simulation designs. For one of these, the identified set is a simple three-dimensional polyhedron with six vertices and eight faces which is characterized by eight inequalities involving exact expectations. Estimation proceeds by replacing these expectations with sample means. We document significant gains in estimation speed and convergence to the true set, compared with the algorithm used in Manski and Tamer (2002). For the other design, the identified set is more complex, but we show that it can be closely approximated by a simple polyhedron.

Cerquera, Daniel, François Laisney and Hannes Ullrich (2012), Considerations on Partially Identified Regression Models, ZEW Discussion Paper No. 12-024, Mannheim.

Authors Daniel Cerquera // François Laisney // Hannes Ullrich