Time Series Forecasting

This examples uses the modlee package for time series forecasting. We’ll use the Air Passengers dataset to show you how to:

Prepare the Data: Load and preprocess the dataset, including scaling and splitting into training and test sets.
Use Modlee for Model Training: Train a model using Modlee’s framework.
Evaluate Model: Assess the performance of the trained model on the test data.

Note: currently TimeseriesForecastingModleeModel does not support PyTorch Recurrent Layers.

First, we will import the the necessary libraries and set up the environment.

import torch
import os
import modlee
import lightning.pytorch as pl
from torch.utils.data import DataLoader, TensorDataset
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import numpy as np
import seaborn as sns

Now, we will set up the modlee API key and initialize the modlee package. You can access your modlee API key from the dashboard.

Replace replace-with-your-api-key with your API key.

os.environ['MODLEE_API_KEY'] = "replace-with-your-api-key"
modlee.init(api_key=os.environ['MODLEE_API_KEY'])

Next, we will prepare and load our data. We will use the Air Passengers dataset, which contains the monthly number of airline passengers over several years. We can load this dataset using the seaborn library.

data = sns.load_dataset('flights')

Now, we will prepare our data for training. We create a function called prepare_air_passenger_data to handle this process.

def prepare_air_passenger_data(data, seq_length):
    # Convert the 'passengers' column to a float array
    passenger_data = data['passengers'].values.astype(float)

    # Scale the data to the range [0, 1]
    scaler = MinMaxScaler(feature_range=(0, 1))
    passenger_data = scaler.fit_transform(passenger_data.reshape(-1, 1))

    X, y = [], []
    # Create sequences of length seq_length for forecasting
    for i in range(len(passenger_data) - seq_length):
        X.append(passenger_data[i:i + seq_length])  # Append the sequence
        y.append(passenger_data[i + seq_length])     # Append the target value

    X = np.array(X)
    y = np.array(y)

    return X, y, scaler  # Return the sequences, target values, and scaler

We will call the prepare_air_passenger_data function and also split the dataset into training and testing sets.

# Set the sequence length to 12, indicating we will use the past 12 months of data
seq_length = 12

# Prepare the data by calling the function to get the sequences and target values
X, y, scaler = prepare_air_passenger_data(data, seq_length)

# Reshape X to match the input shape required for the MLP: (batch_size, seq_length, input_dim)
X = X.reshape(-1, seq_length, 1)
y = y.reshape(-1, 1)

# Split the data into training and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert the training and test sets to PyTorch tensors
X_train, y_train = torch.Tensor(X_train), torch.Tensor(y_train)
X_test, y_test = torch.Tensor(X_test), torch.Tensor(y_test)

Now, we can create our dataloaders for both the training and testing datasets. We will use the TensorDataset class from PyTorch to create datasets from the training and testing tensors.

# Create TensorDataset for training and test data
train_dataset = TensorDataset(X_train, y_train)
test_dataset = TensorDataset(X_test, y_test)

# Create DataLoader instances for training and validation
train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=32, shuffle=False)

After preparing the data, we need to define our model. We will create a simple Multi-Layer Perceptron (MLP) model for time series forecasting using Modlee’s framework.

# Define the MLP model for time series forecasting, inheriting from Modlee's model
class TimeSeriesForecasterMLP(modlee.model.TimeseriesForecastingModleeModel):
    def __init__(self, input_dim, seq_length, hidden_dim=64):
        super().__init__()
        self.seq_length = seq_length  # Store the sequence length
        self.hidden_dim = hidden_dim    # Set the number of hidden units

        # Define the layers of the MLP
        self.fc1 = torch.nn.Linear(input_dim * seq_length, hidden_dim)  # First hidden layer
        self.fc2 = torch.nn.Linear(hidden_dim, hidden_dim)  # Second hidden layer
        self.fc3 = torch.nn.Linear(hidden_dim, 1)  # Output layer

        # Define the loss function (Mean Squared Error)
        self.loss_fn = torch.nn.MSELoss()

    def forward(self, x):
        x = x.view(x.size(0), -1)  # Flatten the input for MLP
        x = torch.relu(self.fc1(x))  # Apply ReLU activation after first layer
        x = torch.relu(self.fc2(x))  # Apply ReLU activation after second layer
        predictions = self.fc3(x)  # Generate predictions from the output layer
        return predictions

    def training_step(self, batch):
        x, y = batch  # Unpack the input features and targets
        preds = self.forward(x)  # Forward pass to get predictions
        loss = self.loss_fn(preds, y)  # Calculate loss
        return loss

    def validation_step(self, batch):
        x, y = batch  # Unpack the input features and targets
        preds = self.forward(x)  # Forward pass to get predictions
        loss = self.loss_fn(preds, y)  # Calculate validation loss
        return loss

    def configure_optimizers(self):
        # Use Adam optimizer for model parameters
        return torch.optim.Adam(self.parameters(), lr=1e-3)

Now, we can proceed to train our model using the Modlee package. We create an instance of our TimeSeriesForecasterMLP model and then set up the training loop using the Trainer class from PyTorch Lightning.

input_dim = 1  # We have one feature (number of passengers)

# Initialize the Modlee model
model = TimeSeriesForecasterMLP(input_dim, seq_length)

# Train the model using PyTorch Lightning
with modlee.start_run() as run:
    trainer = pl.Trainer(max_epochs=1)
    trainer.fit(
        model=model,
        train_dataloaders=train_dataloader,
        val_dataloaders=test_dataloader
    )

Finally, we inspect the artifacts saved by Modlee, including the model graph and various statistics. With Modlee, your training assets are automatically saved, preserving valuable insights for future reference and collaboration.

last_run_path = modlee.last_run_path()
print(f"Run path: {last_run_path}")
artifacts_path = os.path.join(last_run_path, 'artifacts')
artifacts = sorted(os.listdir(artifacts_path))
print(f"Saved artifacts: {artifacts}")