Thumbnail Image

Evaluation of estimators for ill-posed statistical problems subject to multicollinearity

Multicollinearity is a significant problem in economic analysis and occurs in any situation where at least two of the explanatory variables in a model are related to one another. The presence of multicollinearity is problematic, as changes in the dependent variable cannot be accurately attributed to individual explanatory variables. It can cause estimated coefficients to be unstable and have high variances, and thus be potentially inaccurate and inappropriate to guide management or policy. Due to this problem, many alternative estimators have been developed for the analysis of multicollinear data. The primary objective of this thesis is to compare and contrast the performance of some of these common estimators, as well as a number of new estimators, and test their prediction accuracy and precision under various circumstances. Through the use of non-trivial Monte Carlo experiments, the estimators are tested under 10 different levels of multicollinearity, with regressors and errors drawn from different distributions (normal, student t, chi-squared, and in the case of errors, mixed Gaussian). Insights are gained through response surface analysis, which is conducted to help summarise the output of these simulations. A number of key findings are identified. The highest levels of mean square error (MSE) are generally given by a Generalised Maximum Entropy estimator with narrow support bounds defined for its coefficients (GMEN) and the One-Step Data Driven Entropy (DDE1) model. Yet, none of the estimators evaluated produced sufficiently high levels of MSE to suggest that they were inappropriate for prediction. The most accurate predictions, regardless of the distributions tested or multicollinearity, were given by Ordinary Least Squares (OLS). The Leuven-2 estimator appeared relatively robust in terms of MSE, being reasonably invariant to changes in condition number, and error distribution. However, it was unstable due to variability in error estimation arising from the arbitrary way that probabilities are converted to coefficient values in this framework. In comparison, MSE values for Leuven-1 were low and far more stable than those reported for Leuven-2. The estimators that produced the least precision risk, as measured through mean square error loss (MSEL), were the GMEN and Leuven-1 estimators. However, the GMEN model requires exogenous information and, as such, is much more problematic to accurately apply in different contexts. In contrast, two models had very poor precision in the presence of multicollinear data, the Two-Step Data Driven Entropy (DDE2) model and OLS, rendering them inappropriate for estimation in such circumstances. Overall, these results highlight that the Leuven-1 estimator is the most appropriate if a practitioner wishes to achieve high prediction accuracy and precision in the presence of multicollinearity. Nevertheless, it is critical that more attention is paid to the theoretical basis of the Leuven-1 estimator, as relating estimated probabilities to coefficients using concepts drawn from the theory of light appears highly subjective. This is illustrated through the differences in empirical results obtained for the Leuven-1 and Leuven-2 estimators.
Type of thesis
Holland, L. M. (2014). Evaluation of estimators for ill-posed statistical problems subject to multicollinearity (Thesis, Master of Management Studies (MMS)). University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/8822
University of Waikato
All items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.