Pattern sequence analysis to forecast time series

Martínez Álvarez, Francisco

Pattern sequence analysis to forecast time series

Martínez Álvarez, Francisco

Dirigida por:

Alicia Troncoso Lara Directora

Universidad de defensa: Universidad Pablo de Olavide

Fecha de defensa: 12 de marzo de 2010

Tribunal:

José Cristobal Riquelme Santos Presidente/a
Juan Antonio Ortega Ramírez Secretario/a
João Manuel Portela da Gama Vocal
Óscar Cordón García Vocal
Héctor Pomares Cintas Vocal

Departamento:

Deporte e Informática

Tipo: Tesis

Teseo: 303777 DIALNET

Resumen

The main goal of this work is to develop a general-purpose algorithm devoted to forecast data temporally generated. However, during the elaboration of this work, several complementary tasks have been completed in order to add robustness to the algorithm. First, the application of clustering techniques over time series have been shown to be useful to discover patterns. Indeed, several miscellaneous time series related to electricity, seismicity and pollution have been studied by means of such techniques. Since each technique tends to discover clusters with prefixed shapes, the use of K-means, Expectation-Maximization and Fuzzy C-means have been discussed in order to evaluate the adequacy for each time series. Hence, a new methodology to systematically select the number of clusters has been proposed. This strategy is based on a majority-based votes and combines all the three aforementioned techniques. In order to take advantage of the information provided by clustering techniques, a new forecasting time series algorithm has been developed. Thus, once the time series under analysis is labeled by applying such techniques, these clusters (or labels) are sequentially computed in order to find similarities between the days before the day to be predicted and the historical data. Finally, the algorithm averages the samples after the patterns found in the historical data and generates a prediction. Last, this thesis also addresses the apparition of data with specially unexpected values. To deal with these data or outliers, a novel hybrid methodology has been proposed. Thus, the apparition of outliers is predicted by inserting an existing approach based on discovering of frequent episodes in sequences in the the general scheme of prediction.