Alignment uncertainty, regressive alignment and large scale deployment

  1. Floden, Evan
unter der Leitung von:
  1. Cedric Notredame Doktorvater/Doktormutter

Universität der Verteidigung: Universitat Pompeu Fabra

Fecha de defensa: 30 von November von 2018

Gericht:
  1. María del Mar Alba Soler Präsident/in
  2. Ana María Rojas Mendoza Sekretärin
  3. Desmond Higgins Vocal

Art: Dissertation

Teseo: 574855 DIALNET lock_openTDX editor

Zusammenfassung

A multiple sequence alignment (MSA) provides a description of the relationship between biological sequences where columns represent a shared ancestry through an implied set of evolutionary events. The majority of research in the field has focused on improving the accuracy of alignments within the progressive alignment framework and has allowed for powerful inferences including phylogenetic reconstruction, homology modelling and disease prediction. Notwithstanding this, when applied to modern genomics datasets - often comprising tens of thousands of sequences - new challenges arise in the construction of accurate MSA. These issues can be generalised to form three basic problems. Foremost, as the number of sequences increases, progressive alignment methodologies exhibit a dramatic decrease in alignment accuracy. Additionally, for any given dataset many possible MSA solutions exist, a problem which is exacerbated with an increasing number of sequences due to alignment uncertainty. Finally, technical difficulties hamper the deployment of such genomic analysis workflows - especially in a reproducible manner - often presenting a high barrier for even skilled practitioners. This work aims to address this trifecta of problems through a web server for fast homology extension based MSA, two new methods for improved phylogenetic bootstrap supports incorporating alignment uncertainty, a novel alignment procedure that improves large scale alignments termed regressive MSA and finally a workflow framework that enables the deployment of large scale reproducible analyses across clusters and clouds titled Nextflow. Together, this work can be seen to provide both conceptual and technical advances which deliver substantial improvements to existing MSA methods and the resulting inferences.