|image1|
.. |image1| image:: https://github.com/mansiagr4/gifs/raw/main/new_small_logo.svg
Time Series Classification
==========================
In this tutorial, we will walk through the process of building a
multivariate time series classification model using Modlee and
``PyTorch``.
Time series classification is a task where models predict categorical
labels based on sequential input data. We will use a dataset that
contains time series data representing different car outlines extracted
from video footage.
**Note**: Currently, Modlee does not support recurrent LSTM operations.
Instead, we will focus on non-recurrent models suited for time series
data, such as convolutional neural networks (CNNs) and transformers,
which can effectively capture sequential patterns without requiring
recurrent layers.
|Open in Kaggle|
First, we will import the the necessary libraries and set up the
environment.
.. code:: python
import torch
import os
import modlee
import lightning.pytorch as pl
from torch.utils.data import DataLoader, TensorDataset
import pandas as pd
Now, we will set up the ``modlee`` API key and initialize the ``modlee``
package. You can access your ``modlee`` API key `from the
dashboard `__.
Replace ``replace-with-your-api-key`` with your API key.
.. code:: python
modlee.init(api_key="replace-with-your-api-key")
The dataset we will use consists of time series data that represent
outlines of four different types of cars (sedan, pickup, minivan, SUV)
extracted from traffic videos using motion information. Each vehicle is
mapped onto a 1-D series, where each series captures the vehicle’s
outline. The objective is to classify these series into one of the four
classes.
For this example, we will manually download the dataset from Kaggle and
upload it to the environment. Visit the `Time Series Classification
dataset
page `__
on Kaggle and click the **Download** button to save the dataset to your
local machine.
Copy the path to the donwloaded files, which will be used later.
To load the data, we create a function that reads the files and
processes them into ``PyTorch`` tensors. Each time series entry has
features representing the outline of a vehicle, with the first column in
the dataset being the target label.
.. code:: python
def load_car_from_txt(file_path):
# Load the dataset with space as the delimiter and no header
data = pd.read_csv(file_path, delim_whitespace=True, header=None)
y = data.iloc[:, 0].values # The first column represents the target (car type)
X = data.iloc[:, 1:].values # The rest of the columns represent the time series features
# Convert the features and labels to PyTorch tensors
X = torch.tensor(X, dtype=torch.float32).unsqueeze(-1) # Add a dimension for input size
y = torch.tensor(y, dtype=torch.long) # Ensure labels are in long tensor format for classification
return X, y
# Load the training data
train_file_path = 'path-to-Car_TRAIN.txt'
X_train, y_train = load_car_from_txt(train_file_path)
# Load the test data
test_file_path = 'path-to-Car_TEST.txt'
X_test, y_test = load_car_from_txt(test_file_path)
After loading the data, we create ``PyTorch TensorDataset`` and
``DataLoader`` objects to facilitate data handling during training and
validation.
.. code:: python
# Create PyTorch TensorDatasets
train_dataset = TensorDataset(X_train, y_train)
test_dataset = TensorDataset(X_test, y_test)
# Create DataLoaders for training and testing
train_dataloader = DataLoader(train_dataset, batch_size=8, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=8, shuffle=False)
We define a Transformer-based neural network for multivariate time
series classification. The model includes:
- A ``TransformerEncoder`` layer to capture sequence dependencies.
- A fully connected ``(fc)`` layer that maps the encoder output to
class labels.
- Cross-entropy loss for training, optimized with the Adam optimizer.
.. code:: python
class TransformerTimeSeriesClassifier(modlee.model.TimeseriesClassificationModleeModel):
def __init__(self, input_dim, seq_length, num_classes, num_heads=1, hidden_dim=64):
super().__init__()
# Define a Transformer encoder layer with specified input dimension and number of attention heads
self.encoder_layer = torch.nn.TransformerEncoderLayer(d_model=input_dim, nhead=num_heads)
# Stack Transformer encoder layers to create a Transformer encoder
self.transformer_encoder = torch.nn.TransformerEncoder(self.encoder_layer, num_layers=2)
# Fully connected layer to map encoded features to class scores
self.fc = torch.nn.Linear(input_dim * seq_length, num_classes)
# Set the loss function to CrossEntropyLoss for multi-class classification
self.loss_fn = torch.nn.CrossEntropyLoss()
def forward(self, x):
# Pass input through the Transformer encoder to capture dependencies
x = self.transformer_encoder(x)
# Flatten the output and pass it through the fully connected layer for class prediction
x = x.view(x.size(0), -1)
x = self.fc(x)
return x
def training_step(self, batch):
# Get input data and target labels from batch
x, y = batch
# Forward pass to generate predictions
preds = self.forward(x)
# Calculate loss using the specified loss function
loss = self.loss_fn(preds, y)
return loss
def validation_step(self, batch):
# Get input data and target labels from batch
x, y = batch
# Forward pass to generate predictions
preds = self.forward(x)
# Calculate validation loss
loss = self.loss_fn(preds, y)
return loss
def configure_optimizers(self):
# Use the Adam optimizer with a learning rate of 1e-3 for optimization
return torch.optim.Adam(self.parameters(), lr=1e-3)
# Instantiate the model with specified parameters
modlee_model = TransformerTimeSeriesClassifier(input_dim=1, seq_length=577, num_classes=4)
To train the model, we use ``PyTorch Lightning's Trainer`` class, which
simplifies the training loop.
.. code:: python
# Start a Modlee run for tracking
with modlee.start_run() as run:
trainer = pl.Trainer(max_epochs=1)
trainer.fit(
model=modlee_model,
train_dataloaders=train_dataloader,
val_dataloaders=test_dataloader
)
After training, we inspect the artifacts saved by Modlee, including the
model graph and various statistics. With Modlee, your training assets
are automatically saved, preserving valuable insights for future
reference and collaboration.
.. code:: python
last_run_path = modlee.last_run_path()
print(f"Run path: {last_run_path}")
artifacts_path = os.path.join(last_run_path, 'artifacts')
artifacts = sorted(os.listdir(artifacts_path))
print(f"Saved artifacts: {artifacts}")
.. |Open in Kaggle| image:: https://kaggle.com/static/images/open-in-kaggle.svg
:target: https://www.kaggle.com/code/modlee/time-series-classification