Time series analysis is a collection of mathematical models, methods, and techniques used to analyze time series data. More specifically, time series analysis aims to understand the characteristics of time series data including trends and seasonality as well as to build models either for classification, forecasting, or anomaly detection purposes.
As the volume of time series data grows given the explosion of cloud computing (e.g., server metrics, network data, etc), Internet of Things (IoT), and user generated data, it is critical to analyze them in an efficient and accurate manner. The temporal nature of time series data presents an additional layer of challenge in analysis.
First, one must account for temporal components like trends, seasonality, and cyclicity before applying various techniques. Second, given the large volume and rapid flow of data, there is an increasing need to analyze the data in real-time.
To better understand the analysis of time series data, we can first unpack the key temporal components made significant by the presence of time.
There are four key components of time series data analysis:
Thorough analysis requires perspective into each component, yet they can be analyzed and considered separately. Understanding each will help build a better understanding of data analysis overall.
Trend refers to the general direction of data over a time period. More specifically, data can have an upward/increasing, downward/decreasing, or no trend. Some data sets show deterministic trends where the behavior can be modeled by a mathematical function such as linear or exponential functions. Other data sets may be stochastic in nature where the pattern may change more randomly.
Trends can be identified by the Augmented Dickey-Fuller (ADF) test or Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test. For example, the ADF test determines whether a given time series is stationary or not. Stationarity is an important property of time series data that indicates that the statistical properties of the series - such as their mean or variance - do not change over time. Variance would indicate trend.
Seasonality refers to periodic changes in data that occur at regular time intervals such as daily, weekly, or yearly. A common example used to describe seasonality involves fluctuation in retail sales as they follow an uptick during holidays towards the end of the year. Consider the shopping extravaganza around Thanksgiving, Black Friday and Christmas. Then the resulting downturn in the new year when everyone puts their wallets to rest.
Seasonality can be detected by identifying seasonal indicators and plotting them over a seasonal period. For example, one can identify peak winter months when addressing fluctuations in energy usage. Other examples include vacation spending during the summer months, sports leagues with off-and-on seasons, and so on.
Cyclicity refers to variations in data that occur over a longer time period than typical time frame for seasonality. This means instead of months or season, it's year or decades. Many of the concepts used to describe seasonality apply to cyclicity as well, except that the time period is usually longer. For example, performance of the stock market of decades may show cyclicity as it goes through cycles of economic boom and bust. Many recent events may seem unique to their time, only to be proven part of a much older, slower set of cycles.
Irregularity refers to unexpected behaviors and deviations in the data. Irregularity may be due to pure randomness of the data or from external factors.
For example, in Internet-of-Things(IoT) systems, sensors may show irregular behavior if the sensor is malfunctioning or if there are external changes that impact new changes. Irregularity is an important piece for anomaly detection. Other irregularities may be due to scheduled changes, such as market holidays in financial markets or planned downtime for system upgrades.
Now that we know of the components one can analyze through, we will look at methods. When exploring time series data, there are different types of analysis available depending on the goal.
These analysis types include:
The primary goal of descriptive analysis is to summarize patterns in time series data. These patterns may include determining whether or not the data is stationary, that it has a trend or seasonal impact, that there are causal relationships or anomalies. The output of a descriptive analysis is usually a report containing summaries of the data that also includes more mathematical statistical information: mean, median, standard deviation, confidence, and so on.
If you have ever read a study that summarized the findings of data, the narrative wrapper is the descriptive analysis. It can help make the ideas more accessible. You can show someone pages and pages of downward charts and formulae, or you can tell them: "the market crashed".
Exploratory analysis is similar to descriptive analysis except it has more of an emphasis in identifying patterns and exploring them rather than simply summarizing them. The goal is to identify not only trends and seasonality, but also to explore and detect anomalies.
For software companies, exploratory analysis may involve looking at time series data readings for their software infrastructure. This may reveal fluctuations in website traffic or load on their servers. This will reveal patterns in usage such as usage spikes during business hours, seasonality from consumer events, as well as long-term trends.
It can be both broad and narrow. Broad anomalies in server behaviour may indicate the need for great server capacity during a seasonal time window, while narrow anomalies in server behaviour may indicate an intruder. The data is seen, then explored to find the root of the anomaly.
The goal of explanatory analysis is to take it one step further to explain why the patterns summarized in statistical analysis or detailed in exploratory analysis are occurring. The emphasis is on understanding the causal relationships between the different variables or temporal effects.
For example, an Internet-of-Things (IoT) company may look at the relationship between energy usage and other factors like weather or energy prices based on sensor readings. Sometimes the relationship between these factors may not be simple to point out especially if the relationship is not not obvious.
Consider that a cold weather sensors suddenly reports a spike in warmth during winter. The temperature should be colder, but after exploring the anomaly, it turns out the explanation was that a family of racoons had nested beside it.
Finally, intervention analysis is used to determine the impact of an event on the dataset. This often involves looking at the data before and after a point of change.
For example, a company may run A/B experiments to gather user metrics. Intervention analysis will use either statistical or machine learning models to estimate the impact of the A/B campaign. If one variant out-performed the other, then the product team has a strong basis to make a change. This information acts as an intervention on their development choices.
There are various mathematical techniques used to analyze time series data ranging from simple statistical methods to complex machine learning models. There are trade-offs between complexity, speed, and accuracy.
Here are some popular techniques and models used in time series analysis:
- Classification: used to assign different categories to the data by dividing the data into meaningful groups. Popular algorithms include nearest-neighbor classification, shapelet transforms, and random forests
- Curve fitting: used to construct a curve that best fits a series of time series data for regression analysis or extrapolation. For simple datasets, either a linear or polynomial regression is used. If neither of those models work, non-linear models like sigmoid, Gaussian, Lorentzian, or Voigt functions may be used
- Segmentation: used to divide the data into sequences of discrete time chunks to extract characteristics. Segmentation can either be top-down or bottom-up where the data is recursively divided or merged. Other algorithms use a sliding window
- Forecasting: used to predict future values by applying characteristics of historical data points. Statistical models include moving averages, exponential smoothing, or Auto-Regressived Integrated Moving Average (ARIMA). Machine learning methods include support vector machines (SVMs), random forests, or long short-term memory networks (LSTMs)
- Anomaly detection: used to identify data points with significant deviations from rest of the data. Algorithms can be as simple as setting a threshold to more complex statistical methods or machine learning models.
Time series analysis is widely used across various industries where time series data plays a critical role. As long as a sufficiently large amount of data is stored and ready for processing, time series analysis methods and techniques can be used to drive business value.
Some examples of time series analysis in different industries include:
- Financial: predicting stock prices, analyzing economic growth or retraction
- Commerce: inventory analysis, predicting future sales volume, customer engagement
- Manufacturing: detecting anomalies for predictive maintenance, optimizing efficiency by ramping up or down production
- Smart city: forecasting weather patterns to optimize energy usage, real-time traffic analysis
- Healthcare: detecting anomalies in healthcare data for medical assistance, such as fall detection or recognizing problems in bodily function like irregular heart beats