ARIMA (AutoRegressive Integrated Moving Average) Model

ARIMA (AutoRegressive Integrated Moving Average) Model

ARIMA (AutoRegressive Integrated Moving Average) model is a popular statistical method used for time series forecasting. It combines three main components:

  1. AutoRegressive (AR): Uses the relationship between an observation and a number of lagged observations (previous data points).
  2. Integrated (I): Involves differencing the data to make it stationary (removing trends or seasonality).
  3. Moving Average (MA): Utilizes the relationship between an observation and a residual error from a moving average model applied to lagged observations.

The ARIMA model is defined by three parameters: ppp, ddd, and qqq:

  • ppp: Number of lag observations included in the model (AR term).
  • ddd: Degree of differencing (how many times data is differenced to make it stationary).
  • qqq: Size of the moving average window (MA term).

Variants

  • SARIMA (Seasonal ARIMA): Extends ARIMA by including seasonal components.
  • ARIMAX: ARIMA with exogenous variables for forecasting.

Use Cases

ARIMA is widely used in economics, finance, weather forecasting, and any domain requiring time series analysis.

Python Code:

import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
import numpy as np

# Load dataset (replace 'your_time_series_data.csv' with your file)
# Assume the CSV file has a 'date' column and a 'value' column
data = pd.read_csv('your_time_series_data.csv', parse_dates=['date'], index_col='date')

# Visualize the data
plt.figure(figsize=(10, 6))
plt.plot(data, label="Original Data")
plt.title("Time Series Data")
plt.xlabel("Date")
plt.ylabel("Value")
plt.legend()
plt.show()

# Split into train and test sets
train_size = int(len(data) * 0.8)
train, test = data[0:train_size], data[train_size:]

# Fit the ARIMA model
p, d, q = 2, 1, 2  # ARIMA parameters
model = ARIMA(train, order=(p, d, q))
model_fit = model.fit()

# Summary of the model
print(model_fit.summary())

# Forecast
forecast_steps = len(test)
forecast = model_fit.forecast(steps=forecast_steps)

# Evaluate the model
mse = mean_squared_error(test, forecast)
print(f"Mean Squared Error: {mse:.4f}")
rmse = np.sqrt(mse)
print(f"Root Mean Squared Error: {rmse:.4f}")

# Plot the results
plt.figure(figsize=(10, 6))
plt.plot(train, label="Train Data")
plt.plot(test, label="Test Data")
plt.plot(test.index, forecast, label="Forecast", color='orange')
plt.title("ARIMA Forecast")
plt.xlabel("Date")
plt.ylabel("Value")
plt.legend()
plt.show()

How to Use:

  1. Replace 'your_time_series_data.csv' with your dataset file.
  2. Ensure your dataset contains a date column and a value column.
  3. Adjust the ARIMA parameters p, d, and q based on your data characteristics. Use tools like autocorrelation plots (statsmodels.graphics.tsaplots.plot_acf) to fine-tune these values.