What is ARIMA?
Auto-Regressived Integrated Moving Average or ARIMA is a popular statistical method used in time series analysis. ARIMA models build on top of Auto-Regressive Moving Average (ARMA) models which use auto-regressive moving averages without taking into account non-stationary data.
Component of ARIMA
ARIMA models are composed of three parameters:
- p: Auto-regressive component of the model is characterized by parameter
p
. This determines the number of lagging data points to use in the model.p=0
means the time series data is essentially noise. This is useful for serving as a null hypothesis.p>0
means that we are accounting for previous values in time to show some reversion to the mean - q: The moving average component is modeled by parameter
q
. This parameter specifies the number of lagging error terms to use in the prediction model - d: Parameter
d
is used to describe the order of differencing. In other words, this describes the number of preprocessing steps used to make the data stationary, or in other words the removal of trends
While ARIMA is a powerful model used heavily in time series analysis, one major drawback is that the ARIMA model assumes that the data is stationary. While differencing methods can be used to transform non-stationary data, it can lead to some loss of information.
SARIMA, ARIMAX, SARIMAX
Like many acronyms, this one comes in a few different flavours.
-
SARIMA (Seasonal AutoRegressive Integrated Moving Average): This is an extension of ARIMA that explicitly supports univariate (one random variable) time series data with a seasonal component. It adds three new parameters to specify the auto-regression (AR), differencing (I), and moving average (MA) for the seasonal component of the series, in addition to the standard parameters of the ARIMA model.
-
ARIMAX (AutoRegressive Integrated Moving Average with eXogenous variables): An extension that includes the adding externally derived variables, such as variables other than the time series variable itself that might have an impact on the response variable. For example, if you are trying to forecast sales, you might include factors like marketing spend or holidays in your model
-
SARIMAX (Seasonal AutoRegressive Integrated Moving Average with eXogenous variables): This model combines the seasonality aspect of SARIMA with the ability to handle exogenous variables of ARIMAX. It's useful when you are dealing with seasonal data and when you also have additional variables that might affect your response variable, useful sales forecasting based price predictions