Categorical Multivariate Dependency for Feature Selection applied in Data mining Classification Task

  1. Sosa Cabrera, Gustavo Daniel
Dirigée par:
  1. Miguel García Torres Directeur
  2. Christian Schaerer Directeur/trice

Université de défendre: Universidad Nacional de Asunción

Fecha de defensa: 26 octobre 2023

Type: Thèses

Résumé

Nowadays, the technological progress of a constantly evolving world is related to the unprecedented increase of collected data, with hundreds or thousands of features and instances. As a result, feature selection has become an inseparable part of any preprocessing for dimensionality reduction in machine learning. However, in this Big Data era characterized by complex and heterogeneous datasets, most of the proposed feature selection methods have focused on using a single feature evaluation measure on a unique search space. In this study, a novel filter framework based on partition and intercooperation (PART_FS) is proposed. In this approach, the search space is partitioned into subspaces according to the type of information contained in the feature (i.e. individual informative, synergistic and complementary). For the analysis of redundancy, the strategy based on class separability in conjunction with Markov blanket property is used. Synergy is evaluated using an information theory measure, while complementarity is evaluated using a consistency-based measure. To show the performance of PART_FS, it was compared with five efficient cooperativeness-based feature selection methods, FS_RRC, IIFS, FJMI, SAFE and RELAX_MRMR, on three artificial data and twenty public datasets in combinations with seven classifiers. Experiment results on both artificial and real world data demonstrate the superiority of PART_FS when applied to a variety of problems with distinct characteristics. Hence, the partitioning of the search space and the differential treatment on each subspace could better assess the importance of the features in the preprocessing of complex high-dimensional data