Statistical methods for the analysis of copy number alterations in the genome
- Rueda Palacio, Oscar Manuel
- Cristina Rueda Sabater Directeur/trice
- Ramón Díaz Uriarte Directeur/trice
Université de défendre: Universidad de Valladolid
Fecha de defensa: 19 décembre 2008
- Bonifacio Salvador González President
- Eustasio del Barrio Tellado Secrétaire
- Virgilio Gomez Ruiz Rapporteur
- Juan Francisco Poyatos Rapporteur
- Ana María Rojas Mendoza Rapporteur
Type: Thèses
Résumé
Genomic DNA copy number alterations (CNAs) are associated with complex diseases, including cancer: CNAs are indeed related to tumoral grade, metastasis, and patient survival. CNAs discovered from array-based Comparative Genomic Hybridization (aCGH) data have been instrumental for identifying disease-related genes and potential therapeutic targets. To be immediately useful in both clinical and basic research scenarios, aCGH data analysis requires accurate methods that do not impose unrealistic biological assumptions and that provide direct answers to the key question "What is the probability that this gene/region has CNAs?". Recent studies have shown that these phenomena are common in the population, leading to the term \copy number variation". Thus a second problem is to distinguish between individual copy number variation and copy number changes related to disease. We have developed a statistical model and algorithms based on biological principles to approach these problems. It is a non-homogeneous Hidden Markov Model with an unknown number of hidden states and fitted via Reversible Jump Markov Chain Monte Carlo. With this formulation we can incorporate explicitly the distance between genes/probes and employ Bayesian Model Averaging, thus incorporating model uncertainty and not conditioning our inferences to the selection of a particular model. The model can be extended to include random eects to incorporate heterogeneity among dierent individuals. We present also two algorithms to find common regions of alteration. One of them is oriented to detect regions common to a set of samples with an overall probability of copy number alteration as high as a given threshold and the other identifies subsets of individuals that share regions with a probability of alteration as high as a given threshold. We show, using simulated and real data sets, that our method outperforms alternative ones, and compare the results of our algorithms to others found in the literature on well-known data sets with very satisfactory results.