Time Series Forecasting
This examples uses the modlee
package for time series forecasting.
We’ll use the Air Passengers
dataset to show you how to:
Prepare the Data: Load and preprocess the dataset, including scaling and splitting into training and test sets.
Use Modlee for Model Training: Train a model using Modlee’s framework.
Evaluate Model: Assess the performance of the trained model on the test data.
Note: currently TimeseriesForecastingModleeModel does not support PyTorch Recurrent Layers.
First, we will import the the necessary libraries and set up the environment.
import torch
import os
import modlee
import lightning.pytorch as pl
from torch.utils.data import DataLoader, TensorDataset
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import numpy as np
import seaborn as sns
Now, we will set up the modlee
API key and initialize the modlee
package. You can access your modlee
API key from the
dashboard.
Replace replace-with-your-api-key
with your API key.
os.environ['MODLEE_API_KEY'] = "replace-with-your-api-key"
modlee.init(api_key=os.environ['MODLEE_API_KEY'])
Next, we will prepare and load our data. We will use the
Air Passengers
dataset, which contains the monthly number of airline
passengers over several years. We can load this dataset using the
seaborn
library.
data = sns.load_dataset('flights')
Now, we will prepare our data for training. We create a function called
prepare_air_passenger_data
to handle this process.
def prepare_air_passenger_data(data, seq_length):
# Convert the 'passengers' column to a float array
passenger_data = data['passengers'].values.astype(float)
# Scale the data to the range [0, 1]
scaler = MinMaxScaler(feature_range=(0, 1))
passenger_data = scaler.fit_transform(passenger_data.reshape(-1, 1))
X, y = [], []
# Create sequences of length seq_length for forecasting
for i in range(len(passenger_data) - seq_length):
X.append(passenger_data[i:i + seq_length]) # Append the sequence
y.append(passenger_data[i + seq_length]) # Append the target value
X = np.array(X)
y = np.array(y)
return X, y, scaler # Return the sequences, target values, and scaler
We will call the prepare_air_passenger_data
function and also split
the dataset into training and testing sets.
# Set the sequence length to 12, indicating we will use the past 12 months of data
seq_length = 12
# Prepare the data by calling the function to get the sequences and target values
X, y, scaler = prepare_air_passenger_data(data, seq_length)
# Reshape X to match the input shape required for the MLP: (batch_size, seq_length, input_dim)
X = X.reshape(-1, seq_length, 1)
y = y.reshape(-1, 1)
# Split the data into training and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Convert the training and test sets to PyTorch tensors
X_train, y_train = torch.Tensor(X_train), torch.Tensor(y_train)
X_test, y_test = torch.Tensor(X_test), torch.Tensor(y_test)
Now, we can create our dataloaders for both the training and testing
datasets. We will use the TensorDataset
class from PyTorch
to
create datasets from the training and testing tensors.
# Create TensorDataset for training and test data
train_dataset = TensorDataset(X_train, y_train)
test_dataset = TensorDataset(X_test, y_test)
# Create DataLoader instances for training and validation
train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=32, shuffle=False)
After preparing the data, we need to define our model. We will create a simple Multi-Layer Perceptron (MLP) model for time series forecasting using Modlee’s framework.
# Define the MLP model for time series forecasting, inheriting from Modlee's model
class TimeSeriesForecasterMLP(modlee.model.TimeseriesForecastingModleeModel):
def __init__(self, input_dim, seq_length, hidden_dim=64):
super().__init__()
self.seq_length = seq_length # Store the sequence length
self.hidden_dim = hidden_dim # Set the number of hidden units
# Define the layers of the MLP
self.fc1 = torch.nn.Linear(input_dim * seq_length, hidden_dim) # First hidden layer
self.fc2 = torch.nn.Linear(hidden_dim, hidden_dim) # Second hidden layer
self.fc3 = torch.nn.Linear(hidden_dim, 1) # Output layer
# Define the loss function (Mean Squared Error)
self.loss_fn = torch.nn.MSELoss()
def forward(self, x):
x = x.view(x.size(0), -1) # Flatten the input for MLP
x = torch.relu(self.fc1(x)) # Apply ReLU activation after first layer
x = torch.relu(self.fc2(x)) # Apply ReLU activation after second layer
predictions = self.fc3(x) # Generate predictions from the output layer
return predictions
def training_step(self, batch):
x, y = batch # Unpack the input features and targets
preds = self.forward(x) # Forward pass to get predictions
loss = self.loss_fn(preds, y) # Calculate loss
return loss
def validation_step(self, batch):
x, y = batch # Unpack the input features and targets
preds = self.forward(x) # Forward pass to get predictions
loss = self.loss_fn(preds, y) # Calculate validation loss
return loss
def configure_optimizers(self):
# Use Adam optimizer for model parameters
return torch.optim.Adam(self.parameters(), lr=1e-3)
Now, we can proceed to train our model using the Modlee package. We
create an instance of our TimeSeriesForecasterMLP
model and then set
up the training loop using the Trainer
class from
PyTorch Lightning
.
input_dim = 1 # We have one feature (number of passengers)
# Initialize the Modlee model
model = TimeSeriesForecasterMLP(input_dim, seq_length)
# Train the model using PyTorch Lightning
with modlee.start_run() as run:
trainer = pl.Trainer(max_epochs=1)
trainer.fit(
model=model,
train_dataloaders=train_dataloader,
val_dataloaders=test_dataloader
)
Finally, we inspect the artifacts saved by Modlee, including the model graph and various statistics. With Modlee, your training assets are automatically saved, preserving valuable insights for future reference and collaboration.
last_run_path = modlee.last_run_path()
print(f"Run path: {last_run_path}")
artifacts_path = os.path.join(last_run_path, 'artifacts')
artifacts = sorted(os.listdir(artifacts_path))
print(f"Saved artifacts: {artifacts}")