«Innovative Imputation Techniques Designed for the Agricultural Resource Management Survey ∗ † ‡ Michael W. Robbins Sujit K. Ghosh Joshua D. ...»
Both the current NASS method and the ABB method lack the multivariate sophistication required for a high dimensional dataset. These methods only utilize three covariates, and there are several highly informative covariates that go unused. Also, the methods do not allow the imputer to model variables with missing values on other variables with missing values, thereby implying that relationships between these variables will likely be distorted by the imputation process.
The SR methods should enable the imputer to capture the marginal characteristics of the data. Likewise, it will oﬀer improvement over the NASS and ABB methods in terms of preserving variable relationships since it allows variables with missing values to be modeled on any of the fully observed covariates as well as other variables with missing values. However, its non-iterative nature implies that imputations found using this technique will still induce bias into variable relationships as long as those relationships are not suﬃciently explained using the fully observed covariates.
The ISR technique allows for ﬂexible selection of conditional distributions, which is an attribute of other popular MCMC techniques, such as MICE (Van Buuren and Oudshoorn, 1999), SRMI (Raghunathan et al., 2001), and mi (Su et al., 2010).
ISR utilizes joint modeling, since conditional models of the form in (6) are used as opposed to the respective full conditional models. Joint modeling (which is an attribute of the data augmentation class of imputation procedures — see Little and Rubin 2002 and Schafer 1997 for an outline of such methodology) ensures that after a suﬃcient number of iterations, the imputes represent a draw from the posterior distribution of the complete data given the observed data.
REFERENCESAzzalini, A. (1985), “A Class of Distributions Which Includes the Normal Ones,” Scandinavian Journal of Statistics, 12, 171–178.
Banker, D. (2007), “ARMS Phase III: Data Processing and Analysis,” Tech. rep., Economic Research Service, prepared for presentation at the FAO Regional Consultation on Statistics for Farmer Income, Bangkok, Thailand, December 11-14, 2007.
Fichman, M. and Cummings, J. N. (2003), “Multiple Imputation for Missing Data: Making the Most of What You Know,” Organizational Research Methods, 6, 282–308.
Horton, N. J. and Lipsitz, S. R. (2001), “Multiple Imputation in Practice: Comparison of Software Packages for Regression Models with Missing Variables,” The American Statistician, 55, 244– 254.
Section on Survey Research Methods – JSM 2010
Kim, J. K. (2002), “A Note on Approximate Bayesian Bootstrap Imputation,” Biometrika, 89, 470–477.
Little, R. J. A. and An, H. (2004), “Robust Likelihood-Based Analysis of Multivariate Data with Missing Values,” Statistica Sinica, 14, 949–968.
Little, R. J. A. and Rubin, D. B. (2002), Statistical Analysis with Missing Data, New Jersey: John Wiley & Sons, 2nd ed.
Miller, D., Robbins, M., and Habiger, J. (2010), “Examining the Challenges of Missing Data Analysis in Phase Three of the Agricultural Resource Management Survey,” in JSM Proceedings, Section on Survey Research Methods, Alexandria, VA: American Statistical Association.
National Research Council (2008), Understanding American Agriculture: Challenges for the Agricultural Resource Management Survey, Washington, D.C.: The National Academies Press.
Newman, D. (2003), “Longitudinal Modeling with Randomly and Systematically Missing Data: A Simulation of Ad Hoc, Maximum Likelihood and Multiple Imputation Techniques,” Organizational Research Methods, 6, 328–362.
Raghunathan, T., Lepkowski, J., Van Hoewyk, J., and Solenberger, P. (2001), “A Multivariate Technique for Multiply Imputing Missing Values Using a Sequence of Regression Models,” Survey Methodology, 27, 85–95.
Rubin, D. B. and Schenker, N. (1986), “Multiple Imputation for Interval Estimation from Simple Random Samples with Ignorable Nonresponse,” Journal of the American Statistical Association, 81, 366–374.
Schafer, J. L. (1997), Analysis of Incomplete Multivariate Data, New York, New York: Chapman and Hall/CRC.
Schafer, J. L. and Graham, J. W. (2002), “Multiple Imputation for Missing Data: Our View of the State of the Art,” Pyschological Methods, 6, 147–177.
Spiess, M. and Keller, F. (1999), “A Mixed Approach and a Distribution Free Multiple Imputation Technique for the Estimation of Multivariate Probit Models with Missing Values,” British Journal of Mathematical and Statistical Psychology, 52, 1–17.
Su, Y.-S., Gelman, A., Hill, J., and Yajima, M. (2010), “Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box,” Journal of Statistical Software, forthcoming.
Van Buuren, S. and Oudshoorn, C. G. M. (1999), Flexible Multivariate Imputation by MICE, TNO Preventie en Gezondheid, Leiden, for associated software see http://www.multipleimputation.com.
Von Hippel, P. T. (2007), “Regression with Missing Y ’s: An Improved Strategy for Analyzing Multiply Imputed Data,” Sociological Methodology, 37, 1–54.