Evolutionary biclustering of gene expression datashifting and scaling pattern-based evaluation

  1. Beatriz Pontes Balanza
Supervised by:
  1. Jesús Salvador Aguilar-Ruiz Director
  2. Raúl Giráldez Director

Defence university: Universidad de Sevilla

Year of defence: 2013

  1. José Cristobal Riquelme Santos Chair
  2. Jorge Sergio Igor Zwir Nawrocki Secretary
  3. Marco Masseroli Committee member
  4. Alicia Troncoso Committee member
  5. Francisco Javier Azuaje González Committee member

Type: Thesis

Teseo: 338565 DIALNET lock_openIdus editor


Biclustering has become a very popular data mining technique due to its ability to explore at the same time two different dimensions, as opposed to clustering techniques, that make use of only one dimension. Gene expression data offer a suitable framework for the application of biclustering algorithms, where enormous amount of information are being produced due to technological advances. Microarray technology offers the possibility of quantifying the expression levels of thousand of genes simultaneously. Furthermore, several experimental conditions may be taken into account, producing thus numerical matrices of expression in which one dimension refers to genes (rows) and the other refers to samples or experimental conditions (columns). This constitutes an ideal scenario for applying biclustering since exploring both dimensions simultaneously would provide the analyst with useful knowledge (subset of genes showing a common tendency under a subset of conditions). Different heuristics have been proposed in order to discover interesting biclusters in data. Many of them are guided by a measure that determines the quality of biclusters. Thus, defining a quality measure represents a key factor in the search of biclusters. The most widespread measure for biclustering of gene expression data has been the mean squared residue (MSR), that can identify correctly some types of patterns, but fails at discovering others. In this PhD Thesis we plan to first make an overview of the most used biclustering techniques for gene expression data, and second to propose a both effective and efficient quality bicluster measure, together with a fully customizable evolutionary biclustering technique. The evaluation measures we propose (named VE and VEt) are based on the use of a standardization procedure in order to perform a comparison among the genes and conditions tendencies, with independence of their specific expression values. On the other hand, our search heuristic (called Evo-Bexpa) offers the possibility of guiding the search towards certain desired characteristics in the biclusters, by adjusting several weights that give preference to some bicluster features over others. Also, new objectives can be easily incorporated into the search. Our work also include a wide set of experimental tests which helps us to validate our proposal, both statistically and biologically.