Time Series Regression
In this tutorial, we will guide you through the process of implementing
a time series regression model using the Modlee framework along with
PyTorch
.
The goal is to predict power consumption based on various environmental factors, such as temperature, humidity, wind speed, and solar radiation.
Note: Currently, Modlee does not support recurrent LSTM operations. Instead, we will focus on non-recurrent models suited for time series data, such as convolutional neural networks (CNNs) and transformers, which can effectively capture sequential patterns without requiring recurrent layers.
First, we will import the the necessary libraries and set up the environment.
import torch
import os
import modlee
import lightning.pytorch as pl
from torch.utils.data import DataLoader, TensorDataset
import pytest
import pandas as pd
from sklearn.model_selection import train_test_split
Now, we will set up the modlee
API key and initialize the modlee
package. You can access your modlee
API key from the
dashboard.
Replace replace-with-your-api-key
with your API key.
modlee.init(api_key="replace-with-your-api-key")
The dataset used in this tutorial includes hourly time series data that links environmental conditions to power consumption across three zones. Each record contains a timestamp, temperature, humidity, wind speed, and measures of solar radiation, alongside the power consumption (in watts) for each zone.
This data allows for the exploration of relationships between weather patterns and energy usage, aiding in the development of predictive models.
For this example, we will manually download the dataset from Kaggle and upload it to the environment. Visit the Time Series Regression dataset page on Kaggle and click the Download button to save the dataset to your local machine.
Copy the path to the donwloaded files, which will be used later.
Next, we need to load the power consumption dataset. This dataset
contains various features related to environmental conditions and their
corresponding power consumption values. The
load_power_consumption_data
function is designed to read the CSV
file, process the data, and create time series sequences.
We then select the relevant features from the dataset for our input
variables, X
, which include temperature, humidity, wind speed, and
solar radiation values. The output variable, y
, is calculated as the
mean power consumption across three different zones.
# Function to load the power consumption dataset and prepare it for training
def load_power_consumption_data(file_path, seq_length):
# Load the dataset from the specified CSV file
data = pd.read_csv(file_path)
# Convert the 'Datetime' column to datetime objects
data['Datetime'] = pd.to_datetime(data['Datetime'])
# Set the 'Datetime' column as the index for the DataFrame
data.set_index('Datetime', inplace=True)
# Extract relevant features for prediction and target variable
X = data[['Temperature', 'Humidity', 'WindSpeed', 'GeneralDiffuseFlows', 'DiffuseFlows']].values
# Calculate the average power consumption across the three zones as the target variable
y = data[['PowerConsumption_Zone1', 'PowerConsumption_Zone2', 'PowerConsumption_Zone3']].mean(axis=1).values
# Convert features and target to PyTorch tensors
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32)
# Create sequences of the specified length for input features
num_samples = X.shape[0] - seq_length + 1
X_seq = torch.stack([X[i:i + seq_length] for i in range(num_samples)])
y_seq = y[seq_length - 1:] # Align target variable with sequences
return X_seq, y_seq
Once we have the preprocessed data, we proceed to create PyTorch
datasets and DataLoaders
.
Here, we load the power consumption data from the specified CSV file. We
create a TensorDataset
to hold the features and labels. To split the
dataset into training and validation sets, we use the
train_test_split
function from sklearn
.
# Define the path to the dataset
file_path = 'path-to-powerconsumption.csv'
# Load the power consumption data with a specified sequence length
X, y = load_power_consumption_data(file_path, 20)
# Create a TensorDataset for the training data
dataset = TensorDataset(X, y)
# Split dataset indices into training and validation sets
train_indices, val_indices = train_test_split(range(len(dataset)), test_size=0.2, random_state=42)
# Create training and validation datasets
train_dataset = TensorDataset(X[train_indices], y[train_indices])
val_dataset = TensorDataset(X[val_indices], y[val_indices])
# Create DataLoader for batch processing during training
train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=32, shuffle=False)
We will now define a multivariate time series regression model by
creating a class that inherits from
modlee.model.TimeseriesRegressionModleeModel
. This class uses a
Transformer-based architecture to predict a continuous value.
We initialize a TransformerEncoder
with multi-head attention to
process sequential dependencies.
class TransformerTimeSeriesRegressor(modlee.model.TimeseriesRegressionModleeModel):
def __init__(self, input_dim, seq_length, num_heads=1, hidden_dim=64):
super().__init__()
# Initialize a Transformer encoder layer with specified input dimensions and heads
self.encoder_layer = torch.nn.TransformerEncoderLayer(d_model=input_dim, nhead=num_heads)
# Stack encoder layers to form the Transformer encoder
self.transformer_encoder = torch.nn.TransformerEncoder(self.encoder_layer, num_layers=2)
# Define a fully connected layer to map encoded features to a single output value
self.fc = torch.nn.Linear(input_dim * seq_length, 1)
# Set the loss function to mean squared error for regression tasks
self.loss_fn = torch.nn.MSELoss()
def forward(self, x):
# Pass input through the Transformer encoder
x = self.transformer_encoder(x)
# Flatten the output and pass it through the fully connected layer
x = x.view(x.size(0), -1)
x = self.fc(x)
return x
def training_step(self, batch):
# Get input and target from batch
x, y = batch
# Generate predictions and compute loss
preds = self.forward(x)
loss = self.loss_fn(preds, y)
return loss
def validation_step(self, batch):
# Get input and target from batch
x, y = batch
# Generate predictions and compute loss
preds = self.forward(x)
loss = self.loss_fn(preds, y)
return loss
def configure_optimizers(self):
# Use the Adam optimizer with a learning rate of 1e-3
return torch.optim.Adam(self.parameters(), lr=1e-3)
model = TransformerTimeSeriesRegressor(input_dim=5, seq_length=20)
With our model defined, we can now train it using the
PyTorch Lightning Trainer
. This trainer simplifies the training
process by managing the training loops and logging.
# Start a training run with Modlee
with modlee.start_run() as run:
trainer = pl.Trainer(max_epochs=1) # Set up the trainer
trainer.fit(
model=model,
train_dataloaders=train_dataloader, # Load training data
val_dataloaders=val_dataloader # Load validation data
)
After training, we inspect the artifacts saved by Modlee, including the model graph and various statistics. With Modlee, your training assets are automatically saved, preserving valuable insights for future reference and collaboration.
last_run_path = modlee.last_run_path()
print(f"Run path: {last_run_path}")
artifacts_path = os.path.join(last_run_path, 'artifacts')
artifacts = sorted(os.listdir(artifacts_path))
print(f"Saved artifacts: {artifacts}")