Pattern sequence analysis to forecast time series

  1. Martínez Álvarez, Francisco
Supervised by:
  1. Alicia Troncoso Lara Director

Defence university: Universidad Pablo de Olavide

Fecha de defensa: 12 March 2010

Committee:
  1. José Cristobal Riquelme Santos Chair
  2. Juan Antonio Ortega Ramírez Secretary
  3. Joao Manuel Portela Gama Committee member
  4. Óscar Cordón García Committee member
  5. Héctor Pomares Cintas Committee member
Department:
  1. Deporte e Informática

Type: Thesis

Teseo: 303777 DIALNET

Abstract

The main goal of this work is to develop a general-purpose algorithm devoted to forecast data temporally generated. However, during the elaboration of this work, several complementary tasks have been completed in order to add robustness to the algorithm. First, the application of clustering techniques over time series have been shown to be useful to discover patterns. Indeed, several miscellaneous time series related to electricity, seismicity and pollution have been studied by means of such techniques. Since each technique tends to discover clusters with prefixed shapes, the use of K-means, Expectation-Maximization and Fuzzy C-means have been discussed in order to evaluate the adequacy for each time series. Hence, a new methodology to systematically select the number of clusters has been proposed. This strategy is based on a majority-based votes and combines all the three aforementioned techniques. In order to take advantage of the information provided by clustering techniques, a new forecasting time series algorithm has been developed. Thus, once the time series under analysis is labeled by applying such techniques, these clusters (or labels) are sequentially computed in order to find similarities between the days before the day to be predicted and the historical data. Finally, the algorithm averages the samples after the patterns found in the historical data and generates a prediction. Last, this thesis also addresses the apparition of data with specially unexpected values. To deal with these data or outliers, a novel hybrid methodology has been proposed. Thus, the apparition of outliers is predicted by inserting an existing approach based on discovering of frequent episodes in sequences in the the general scheme of prediction.