Reconocimiento de redes de genes mediante regresión

Nepomuceno Chamorro, Isabel de los Ángeles

Reconocimiento de redes de genes mediante regresión

Nepomuceno Chamorro, Isabel de los Ángeles

unter der Leitung von:

Jesús Salvador Aguilar-Ruiz Doktorvater

Universität der Verteidigung: Universidad Pablo de Olavide

Fecha de defensa: 27 von Mai von 2011

Gericht:

José Cristobal Riquelme Santos Präsident/in
Cristina Rubio Escudero Sekretär/in
Joaquín Dopazo Blázquez Vocal
Sara Alexandra Cordeiro Madeira Vocal
Francisco Javier Azuaje González Vocal

Fachbereiche:

Deporte e Informática

Art: Dissertation

Teseo: 314876 DIALNET

Zusammenfassung

Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks, i.e. it is filling gaps in the metabolic networks. They are typically built using correlation statistics as pairwise similarity measures. Correlation methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. We propose model trees as a method to identify gene interaction networks. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can adjust different linear regressions to separate areas of the search space favoring to infer localized similarities over a more global similarity. While correlationbased methods analyze each pairs of genes, in our approach we generate a single regression tree for each gene from a set of genes, taking the remaining genes as input. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach named RegNet is experimentally tested on the well-known data set Saccharomyces Cerevisiae. Furthermore, we analyze a data set from normal and Alzheimer's disease-affected subjects. In this case, our result provide new evidences for the hypothesis that cardiovascular, diabetes type 2 and Alzheimer's disease are linked. The application of information encoded in molecular networks for prognostic purposes is a crucial objective of systems biomedicine. Standard approaches to biomarker discovery are based on the identification of differentially expressed genes. However, it is known that biomarkers may be encoded by genes that are not highly differentially expressed across control and disease patients. Network-based prognostic approaches have not been widely investigated in the cardiovascular research area where the prediction of clinical outcome would represent a significant contribution to translational research. We developed a new supervised prediction method for this prognostic problem based on the discovery of clinically-relevant transcriptional association networks (obtained by RegNet). The new method named SA- TuRNo integrates clinical class-specific networks, and can be applied to other clinical domains. Before analyzing our cardiovascular disease dataset, we tested the usefulness of our approach on a benchmark dataset with control and disease patients. We also compared it to several algorithms to infer transcriptional association networks and classification models. Comparative results provided evidence of the prediction power of our approach. Next, we discovered new models for predicting good and bad outcomes after myocardial infarction. Using blood-derived gene expression data, our models reported areas under the receiver operating characteristic curve above 0.70. Our model could also outperform different techniques based on co-expressed gene modules. We also predicted processes that may represent novel therapeutic targets for heart disease, such as the synthesis of leucine and isoleucine. To conclude, gene association networks provide new insights into underlying molecular mechanisms leading to bad prognosis after Miocardial Infarction and set the basis for potential novel prognostic models in Heart Failure. Finally, the integration of huge databases with searching techniques in order to automatically validate results from different sources is a key factor in bioinformatics. Several tools have been developed for analysing geneenrichment in terms. Most of them are Gene Ontology-based tools, i.e., these analyse gene-enrichment in GO annotations. In this work, we introduce a software tool, named CarGene (Characterisation of Genes), that helps scientists to validate sets of genes by using the biological knowledge on metabolic pathways stored in the Kyoto Encyclopedia of Genes and Genomes (Kegg), and providing a friendly graphical environment to analyse and compare results generated by different clustering and/or biclustering techniques. CarGene is based on the degree of coherence of genes in (bi)clusters with respect to metabolic pathways of organisms stored in Kegg, and provides an estimate of obtaining results by chance, including two statistical corrections (Bonferroni, andWestfall and Young). One of the most important features of CarGene is the possibility of simultaneously comparing and statistically analysing the information about many groups of genes in both visual and textual manner. Furthermore, it includes its own web browser to explore in detail the information extracted from Kegg.