Time Series Classification
In this tutorial, we will walk through the process of building a
multivariate time series classification model using Modlee and
PyTorch
.
Time series classification is a task where models predict categorical labels based on sequential input data. We will use a dataset that contains time series data representing different car outlines extracted from video footage.
Note: Currently, Modlee does not support recurrent LSTM operations. Instead, we will focus on non-recurrent models suited for time series data, such as convolutional neural networks (CNNs) and transformers, which can effectively capture sequential patterns without requiring recurrent layers.
First, we will import the the necessary libraries and set up the environment.
import torch
import os
import modlee
import lightning.pytorch as pl
from torch.utils.data import DataLoader, TensorDataset
import pandas as pd
Now, we will set up the modlee
API key and initialize the modlee
package. You can access your modlee
API key from the
dashboard.
Replace replace-with-your-api-key
with your API key.
modlee.init(api_key="replace-with-your-api-key")
The dataset we will use consists of time series data that represent outlines of four different types of cars (sedan, pickup, minivan, SUV) extracted from traffic videos using motion information. Each vehicle is mapped onto a 1-D series, where each series captures the vehicle’s outline. The objective is to classify these series into one of the four classes.
For this example, we will manually download the dataset from Kaggle and upload it to the environment. Visit the Time Series Classification dataset page on Kaggle and click the Download button to save the dataset to your local machine.
Copy the path to the donwloaded files, which will be used later.
To load the data, we create a function that reads the files and
processes them into PyTorch
tensors. Each time series entry has
features representing the outline of a vehicle, with the first column in
the dataset being the target label.
def load_car_from_txt(file_path):
# Load the dataset with space as the delimiter and no header
data = pd.read_csv(file_path, delim_whitespace=True, header=None)
y = data.iloc[:, 0].values # The first column represents the target (car type)
X = data.iloc[:, 1:].values # The rest of the columns represent the time series features
# Convert the features and labels to PyTorch tensors
X = torch.tensor(X, dtype=torch.float32).unsqueeze(-1) # Add a dimension for input size
y = torch.tensor(y, dtype=torch.long) # Ensure labels are in long tensor format for classification
return X, y
# Load the training data
train_file_path = 'path-to-Car_TRAIN.txt'
X_train, y_train = load_car_from_txt(train_file_path)
# Load the test data
test_file_path = 'path-to-Car_TEST.txt'
X_test, y_test = load_car_from_txt(test_file_path)
After loading the data, we create PyTorch TensorDataset
and
DataLoader
objects to facilitate data handling during training and
validation.
# Create PyTorch TensorDatasets
train_dataset = TensorDataset(X_train, y_train)
test_dataset = TensorDataset(X_test, y_test)
# Create DataLoaders for training and testing
train_dataloader = DataLoader(train_dataset, batch_size=8, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=8, shuffle=False)
We define a Transformer-based neural network for multivariate time series classification. The model includes:
A
TransformerEncoder
layer to capture sequence dependencies.A fully connected
(fc)
layer that maps the encoder output to class labels.Cross-entropy loss for training, optimized with the Adam optimizer.
class TransformerTimeSeriesClassifier(modlee.model.TimeseriesClassificationModleeModel):
def __init__(self, input_dim, seq_length, num_classes, num_heads=1, hidden_dim=64):
super().__init__()
# Define a Transformer encoder layer with specified input dimension and number of attention heads
self.encoder_layer = torch.nn.TransformerEncoderLayer(d_model=input_dim, nhead=num_heads)
# Stack Transformer encoder layers to create a Transformer encoder
self.transformer_encoder = torch.nn.TransformerEncoder(self.encoder_layer, num_layers=2)
# Fully connected layer to map encoded features to class scores
self.fc = torch.nn.Linear(input_dim * seq_length, num_classes)
# Set the loss function to CrossEntropyLoss for multi-class classification
self.loss_fn = torch.nn.CrossEntropyLoss()
def forward(self, x):
# Pass input through the Transformer encoder to capture dependencies
x = self.transformer_encoder(x)
# Flatten the output and pass it through the fully connected layer for class prediction
x = x.view(x.size(0), -1)
x = self.fc(x)
return x
def training_step(self, batch):
# Get input data and target labels from batch
x, y = batch
# Forward pass to generate predictions
preds = self.forward(x)
# Calculate loss using the specified loss function
loss = self.loss_fn(preds, y)
return loss
def validation_step(self, batch):
# Get input data and target labels from batch
x, y = batch
# Forward pass to generate predictions
preds = self.forward(x)
# Calculate validation loss
loss = self.loss_fn(preds, y)
return loss
def configure_optimizers(self):
# Use the Adam optimizer with a learning rate of 1e-3 for optimization
return torch.optim.Adam(self.parameters(), lr=1e-3)
# Instantiate the model with specified parameters
modlee_model = TransformerTimeSeriesClassifier(input_dim=1, seq_length=577, num_classes=4)
To train the model, we use PyTorch Lightning's Trainer
class, which
simplifies the training loop.
# Start a Modlee run for tracking
with modlee.start_run() as run:
trainer = pl.Trainer(max_epochs=1)
trainer.fit(
model=modlee_model,
train_dataloaders=train_dataloader,
val_dataloaders=test_dataloader
)
After training, we inspect the artifacts saved by Modlee, including the model graph and various statistics. With Modlee, your training assets are automatically saved, preserving valuable insights for future reference and collaboration.
last_run_path = modlee.last_run_path()
print(f"Run path: {last_run_path}")
artifacts_path = os.path.join(last_run_path, 'artifacts')
artifacts = sorted(os.listdir(artifacts_path))
print(f"Saved artifacts: {artifacts}")