Segmentation is a strategy used in time series analysis whereby the data is divided into sequences of discrete time chunks called segments. The goal with time series segmentation is to extract temporal patterns by observing the characteristics of the data in segments. This is done by analyzing changes in the statistical properties such as the mean or variance.
Some of the most common approaches to segment time series data include:
- Top-down: Top-down segmentation starts with the entire dataset and then recursively breaks it down into smaller segments. For this reason, top-down approach is also referred to as “divide and conquer” or “binary split”. From the original dataset, the data is split into two segments by maximizing the differences between the segments. Then the process is repeated until a clear pattern emerges
- Bottom-up: Button-up approach starts out by breaking down the dataset into multiple segments. For each possible segmentation of the data, an error function calculates the difference between the actual data and the data as represented by the segments. The goal is to find the segmentation that minimizes this error function. Segments are then iteratively merged by choosing segments with the smallest increase in the error function. This is repeated until the number of segments is reached or when the threshold for the error function is reached
- Sliding window: Sliding window algorithm starts out by defining a window -- such as start / left boundary and end / right boundary -- in the time series data. The window then iteratively slides to the right as long as an extension of the window fits under some user-defined threshold
Note that the different approaches can be combined. For example, a popular segmentation approach is called SWAB which combines Sliding Window And Bottom-up algorithms.
Segmenting time series data allows for various applications that give insights into the underlying data:
- Trend analysis: Since various segmentation algorithms use statistical methods to divide the data into similar segments, the data is characterized by different trends which can expose seasonality and other temporal elements
- Forecasting: Insights gained from smaller segments can be used to forecast future time series data points
- Noise reduction: Noisy data defers to data that is corrupted or distorted, via sensor errors, transmission errors, or similar fluctuations. Smooth data happens when variation and noise have been eliminated. Segmentation helps group noisy data into smoothed segments
- Anomaly detection: By determining the characteristics within each time segment, anomalies in the data can be identified