published in: Journal of Labor Economics, 2005, 23 (2), 235-257
In empirical research it is common practice to use sensible rules of thumb for cleaning data.
Measurement error is often the justification for removing (trimming) or recoding (winsorizing)
observations whose values lie outside a specified range. We consider a general
measurement error process that nests many plausible models. Analytic results demonstrate
that winsorizing and trimming are only solutions for a narrow class of measurement error
processes. Indeed, for the measurement error processes found in most social-science data,
such procedures can induce or exacerbate bias, and even inflate the variance estimates. We
term this source of bias "Iatrogenic" (or econometrician induced) error. Monte Carlo
simulations and empirical results from the Census PUMS data and 2001 CPS data
demonstrate the fragility of trimming and winsorizing as solutions to measurement error in the
dependent variable. Even on asymptotic variance and RMSE criteria, we are unable to find
generalizable justifications for commonly used cleaning procedures.
We use cookies to provide you with an optimal website experience. This includes cookies that are necessary for the operation of the site as well as cookies that are only used for anonymous statistical purposes, for comfort settings or to display personalized content. You can decide for yourself which categories you want to allow. Please note that based on your settings, you may not be able to use all of the site's functions.
Cookie settings
These necessary cookies are required to activate the core functionality of the website. An opt-out from these technologies is not available.
In order to further improve our offer and our website, we collect anonymous data for statistics and analyses. With the help of these cookies we can, for example, determine the number of visitors and the effect of certain pages on our website and optimize our content.