Novel efficient deep learning architectures for time series forecasting
- Manuel Jesús Jiménez Navarro
- María del Mar Martínez Ballesteros Director
- Gualberto Asencio Cortés Director
Defence university: Universidad de Sevilla
Year of defence: 2023
Type: Thesis
Abstract
This thesis focuses on the study of time series prediction using the technique known as deep learning or neural networks. At the same time, a series of new methodological proposals are made, which improve the efficiency of existing architectures, applied to a series of real data sets that present a challenge today. The technique known as deep learning has gained great popularity in recent years due to its incredible results in areas such as computer vision, natural language processing and time series prediction, among others. This technique is inspired by the functioning of the basic brain cell, the neuron. Neurons are organized in layers forming a neural network, processing the input information and propagating its output to other layers of neurons until the final output is obtained. This technique has been adapted on multiple occasions to the prediction of time series, developing architectures with results that are competitive with the current state of the art. However, although effectiveness has been a great advantage, sometimes these architectures have degraded their efficiency, preventing their application in real scenarios. There are several ways to improve efficiency, reducing some of the aspects that take a large number of resources such as: memory needed to store the architecture, inference time or training time, among others. This thesis focuses on improving training time, since it is the bottleneck when experimenting with new architectures, optimizing existing architectures, or retraining architectures in certain real scenarios. Faced with the problem of efficiency presented by architectures in the field of deep learning or neural networks, four different proposals have been made, whose main objective is to obtain greater efficiency by obtaining equal or superior effectiveness with respect to the architectures used in the comparative analysis. The first of the proposals introduces the idea of incremental learning into the design of the architecture. This idea establishes different objectives to the layers of the neural network, establishing at the beginning a quite simple objective and increasing the difficulty of the objective assigned to the layers. In this way, the learning process is accelerated by being able to quickly learn the concepts needed for the simplest objective and propagate this knowledge to the subsequent layers. The second proposal builds on the first proposal and makes an additional assumption. Instead of the different objectives being optimized without the more complex ones being able to influence the simpler ones, influence is allowed to exist. In this way, the knowledge gained from the simpler objectives can be partially modified by the subsequent more complex objectives. The third proposal arises as an idea from the first two proposals. In this case the fundamental idea is similar, separating responsibility from the prediction process. In this proposal the liability is separated by decomposing the time series using a smoothing process. The first layer, therefore, receives the smoothed input and is responsible for obtaining a partial prediction. The next layer receives the “residue” resulting from subtracting the original version from the smoothed version. The next layer, therefore, repeats the smoothing process and obtains a new partial prediction. After processing all layers, the partial predictions are summed to obtain the final output. The intuitive idea, therefore, is that each layer has a different role, focusing on different aspects of the time series through decomposition. In turn, the layers must collaborate to obtain the final prediction. The fourth and last proposal integrates an attribute selection method into the neural network architecture, with the objective of reducing the dimensionality of the problem and improving the efficiency of attribute selection techniques applied to deep learning. Other attribute selection proposals applied to deep learning have problems of effectiveness, efficiency and/or interpretability. This proposal describes a new layer connected to the input that serves as a gateway to the different input features, thus eliminating the influence of those features that are irrelevant to the problem. Thanks to this layer, the features can be determined efficiently, without decreasing the efficiency of the architecture to a considerable extent. In addition, this layer serves as a window to the features that the architecture has established as irrelevant, giving an idea of the learned behaviour.