Integer constraints for enhancing interpretability in linear regression

  1. Emilio Carrizosa 1
  2. Alba V. Olivares-Nadal 2
  3. Ramírez Cobo, Josefa 3
  1. 1 Institute of Mathematics of the University of Seville
  2. 2 University of Chicago Booth School of Business,
  3. 3 Universidad de Cádiz
    info

    Universidad de Cádiz

    Cádiz, España

    ROR https://ror.org/04mxxkb11

Revista:
Sort: Statistics and Operations Research Transactions

ISSN: 1696-2281

Año de publicación: 2020

Volumen: 44

Número: 1

Páginas: 69-78

Tipo: Artículo

DOI: 10.2436/20.8080.02.95 DIALNET GOOGLE SCHOLAR lock_openAcceso abierto editor

Otras publicaciones en: Sort: Statistics and Operations Research Transactions

Resumen

One of the main challenges researchers face is to identify the most relevant features in a prediction model. As a consequence, many regularized methods seeking sparsity have flourished. Although sparse, their solutions may not be interpretable in the presence of spurious coefficients and correlated features. In this paper we aim to enhance interpretability in linear regression in presence of multicollinearity by: (i) forcing the sign of the estimated coefficients to be consistent with the sign of the correlations between predictors, and (ii) avoiding spurious coefficients so that only significant features are represented in the model. This will be addressed by modelling constraints and adding them to an optimization problem expressing some estimation procedure such as ordinary least squares or the lasso. The so-obtained constrained regression models will become Mixed Integer Quadratic Problems. The numerical experiments carried out on real and simulated datasets show that tightening the search space of some standard linear regression models by adding the constraints modelling (i) and/or (ii) help to improve the sparsity and interpretability of the solutions with competitive predictive quality.

Referencias bibliográficas

  • Atamurk, A., Nemhauser, G. and Savelsbergh, M. (2000). Conflict graphs in solving integer programming problems. European Journal of Operational Research, 121, 40–55.
  • Bartholomew, D. J., Steele, F., Moustaki, I. and Galbraith, J. (2008). Analysis of Multivariate Social Science Data. Chapman & Hall.
  • Bertsimas, D. and King, A. (2015). OR forum – An algorithmic approach to linear regression. Operations Research, 64, 2–16.
  • Bertsimas, D., King, A., Mazumder, R. (2016). Best subset selection via a modern optimization lens. The Annals of Statistics, 44, 813–852.
  • Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics, 37, 373–384.
  • Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data. Springer.
  • Cai, A., Tsay, R. and Chen, R. (2009). Variable selection in linear regression with many predictors. Journal of Computational and Graphical Statistics, 18, 573–591.
  • Camm, J. D., Raturi, A. S. and Tsubakitani, S. (1990). Cutting big M down to size. Interfaces, 20, 61–66.
  • Cao, G., Guo, Y. and Bouman, C. A. (2010). High dimensional regression using the sparse matrix transform (SMT). In Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pp. 1870–1873. IEEE.
  • Carrizosa, E. and Guerrero, V. (2014). Biobjective sparse principal component analysis. Journal of Multivariate Analysis, 132, 151–159.
  • Carrizosa, E., Nogales-Gómez, A. and Morales, D. R. (2016). Strongly agree or strongly disagree?: Rating features in support vector machines. Information Sciences, 329, 256–273.
  • Carrizosa, E., Nogales-Gómez, A. and Morales, D. R. (2017). Clustering categories in support vector machines. Omega, 66, 28–37.
  • Carrizosa, E., Olivares-Nadal, A. V. and Ramı́rez-Cobo, P. (2016). A sparsity-controlled vector autoregressive model. Biostatistics, 18, 244–259.
  • Chatterjee, S. and Hadi, A. S. (2015). Regression Analysis by Example. John Wiley & Sons.
  • Danna, E., Rothberg, E. and Le Pape, C. (2005). Exploring relaxation induced neighborhoods to improve mip solutions. Mathematical Programming, 102, 71–91.
  • Efron, B. and Hastie, T. (2003). LARS software for R and Splus. https://web.stanford.edu/ hastie/ Papers/LARS/.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32, 407–499.
  • Farrar, D. E. and Glauber, R. R. (1967). Multicollinearity in regression analysis: the problem revisited. The Review of Economic and Statistics, 92–107.
  • Fischetti, M. and Lodi, A. (2005). Local branching. Mathematical Programming, 98, 23–47.
  • Fourer, R., Gay, D. and Kernighan, B. W. (2002). The AMPL book. Duxbury Press, Pacific Grove.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2001). The Elements of Statistical Learning, Volume 1. Springer series in statistics.
  • Hastie, T. and Efron, B. (2013). Least Angle Regression, Lasso and Forward Stagewise. http://cran. r-project.org/web/packages/lars/lars.pdf.
  • Hastie, T., Tibshirani, R. and Wainwright, M. (2015). Statistical Learning with Sparsity: the Lasso and Generalizations. CRC Press.
  • Hesterberg, T., Choi, N. H., Meier, L. and Fraley, C. (2008). Least angle and ℓ1 penalized regression: A review. Statistics Surveys, 2, 61–93.
  • Jou, Y.-J., Huang, C.-C. L. and Cho, H.-J. (2014). A VIF-based optimization model to alleviate collinearity problems in multiple linear regression. Computational Statistics, 29, 1515–1541.
  • Kim, S. and Xing, E. P. (2009). Statistical estimation of correlated genome associations to a quantitative trait network. PLoS genetics, 5, e1000587.
  • Lichman, M. (2016). UCI machine learning repository. http://archive.ics.uci.edu/ml. University of California, Irvine, School of Information and Computer Sciences.
  • Massy, W. F. (1965). Principal components regression in exploratory statistical research. Journal of the American Statistical Association, 60, 234–256.
  • Meinshausen, N. (2013). Sign-constrained least squares estimation for high-dimensional regression. Electronic Journal of Statistics, 7, 1607–1631.
  • Miller, A. (2002). Subset Selection in Regression (2 ed.). Chapman & Hall/CRC.
  • Montgomery, D. C., Peck, E. A. and Vining, G. G. (2012). Introduction to Linear Regression Analysis, Volume 821. John Wiley & Sons.
  • Rothberg, E. (2007). An evolutionary algorithm for polishing mixed integer programming solutions. INFORMS Journal on Computing, 19, 534–541.
  • Savelsbergh, M. (1994). Preprocessing and probing techniques for mixed integer programming problems.
  • ORSA Journal on Computing, 6, 445–454.
  • Sengupta, D. and Bhimasankaram, P. (1997). On the roles of observations in collinearity in the linear model. Journal of the American Statistical Association, 92, 1024–1032.
  • Silvey, S. (1969). Multicollinearity and imprecise estimation. Journal of the Royal Statistical Society. Series B (Methodological), 539–552.
  • Tamura, R., Kobayashi, K., Takano, Y., Miyashiro, R., Nakata, K. and Matsui, T. (2017). Best subset selection for eliminating multicollinearity. Journal of the Operations Research Society of Japan, 60, 321–336.
  • Tamura, R., Kobayashi, K., Takano, Y., Miyashiro, R., Nakata, K. and Matsui, T. (2019). Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor. Journal of Global Optimization, 73, 431–446.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267–288.
  • Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 91–108.
  • Torgo, L. (2016). Regression data sets. http://www.dcc.fc.up.pt/ ltorgo/Regression/DataSets.html. University of Porto, Faculty of Sciences.
  • Watson, P. K. and Teelucksingh, S. S. (2002). A Practical Introduction to Econometric Methods: Classical and Modern. University of West Indies Press.
  • Winner, L. (2016). Miscellaneous data sets. http://www.stat.ufl.edu/ winner/datasets.html. University of Florida.
  • Yu, G. and Liu, Y. (2016). Sparse regression incorporating graphical structure among predictors. Journal of the American Statistical Association, 111, 707–720.
  • Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 301–320.