ARIMA (AutoRegressive Integrated Moving Average) model is a popular statistical method used for time series forecasting. It combines three main components:
- AutoRegressive (AR): Uses the relationship between an observation and a number of lagged observations (previous data points).
- Integrated (I): Involves differencing the data to make it stationary (removing trends or seasonality).
- Moving Average (MA): Utilizes the relationship between an observation and a residual error from a moving average model applied to lagged observations.
The ARIMA model is defined by three parameters: ppp, ddd, and qqq:
- ppp: Number of lag observations included in the model (AR term).
- ddd: Degree of differencing (how many times data is differenced to make it stationary).
- qqq: Size of the moving average window (MA term).
Variants
- SARIMA (Seasonal ARIMA): Extends ARIMA by including seasonal components.
- ARIMAX: ARIMA with exogenous variables for forecasting.
Use Cases
ARIMA is widely used in economics, finance, weather forecasting, and any domain requiring time series analysis.
Python Code:
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
import numpy as np
# Load dataset (replace 'your_time_series_data.csv' with your file)
# Assume the CSV file has a 'date' column and a 'value' column
data = pd.read_csv('your_time_series_data.csv', parse_dates=['date'], index_col='date')
# Visualize the data
plt.figure(figsize=(10, 6))
plt.plot(data, label="Original Data")
plt.title("Time Series Data")
plt.xlabel("Date")
plt.ylabel("Value")
plt.legend()
plt.show()
# Split into train and test sets
train_size = int(len(data) * 0.8)
train, test = data[0:train_size], data[train_size:]
# Fit the ARIMA model
p, d, q = 2, 1, 2 # ARIMA parameters
model = ARIMA(train, order=(p, d, q))
model_fit = model.fit()
# Summary of the model
print(model_fit.summary())
# Forecast
forecast_steps = len(test)
forecast = model_fit.forecast(steps=forecast_steps)
# Evaluate the model
mse = mean_squared_error(test, forecast)
print(f"Mean Squared Error: {mse:.4f}")
rmse = np.sqrt(mse)
print(f"Root Mean Squared Error: {rmse:.4f}")
# Plot the results
plt.figure(figsize=(10, 6))
plt.plot(train, label="Train Data")
plt.plot(test, label="Test Data")
plt.plot(test.index, forecast, label="Forecast", color='orange')
plt.title("ARIMA Forecast")
plt.xlabel("Date")
plt.ylabel("Value")
plt.legend()
plt.show()
How to Use:
- Replace
'your_time_series_data.csv'
with your dataset file. - Ensure your dataset contains a date column and a value column.
- Adjust the ARIMA parameters
p
,d
, andq
based on your data characteristics. Use tools like autocorrelation plots (statsmodels.graphics.tsaplots.plot_acf
) to fine-tune these values.