|image1| .. |image1| image:: https://github.com/mansiagr4/gifs/raw/main/new_small_logo.svg Time Series Regression ====================== In this tutorial, we will guide you through the process of implementing a time series regression model using the Modlee framework along with ``PyTorch``. The goal is to predict power consumption based on various environmental factors, such as temperature, humidity, wind speed, and solar radiation. **Note**: Currently, Modlee does not support recurrent LSTM operations. Instead, we will focus on non-recurrent models suited for time series data, such as convolutional neural networks (CNNs) and transformers, which can effectively capture sequential patterns without requiring recurrent layers. |Open in Kaggle| First, we will import the the necessary libraries and set up the environment. .. code:: python import torch import os import modlee import lightning.pytorch as pl from torch.utils.data import DataLoader, TensorDataset import pytest import pandas as pd from sklearn.model_selection import train_test_split Now, we will set up the ``modlee`` API key and initialize the ``modlee`` package. You can access your ``modlee`` API key `from the dashboard `__. Replace ``replace-with-your-api-key`` with your API key. .. code:: python modlee.init(api_key="replace-with-your-api-key") The dataset used in this tutorial includes hourly time series data that links environmental conditions to power consumption across three zones. Each record contains a timestamp, temperature, humidity, wind speed, and measures of solar radiation, alongside the power consumption (in watts) for each zone. This data allows for the exploration of relationships between weather patterns and energy usage, aiding in the development of predictive models. For this example, we will manually download the dataset from Kaggle and upload it to the environment. Visit the `Time Series Regression dataset page `__ on Kaggle and click the **Download** button to save the dataset to your local machine. Copy the path to the donwloaded files, which will be used later. Next, we need to load the power consumption dataset. This dataset contains various features related to environmental conditions and their corresponding power consumption values. The ``load_power_consumption_data`` function is designed to read the CSV file, process the data, and create time series sequences. We then select the relevant features from the dataset for our input variables, ``X``, which include temperature, humidity, wind speed, and solar radiation values. The output variable, ``y``, is calculated as the mean power consumption across three different zones. .. code:: python # Function to load the power consumption dataset and prepare it for training def load_power_consumption_data(file_path, seq_length): # Load the dataset from the specified CSV file data = pd.read_csv(file_path) # Convert the 'Datetime' column to datetime objects data['Datetime'] = pd.to_datetime(data['Datetime']) # Set the 'Datetime' column as the index for the DataFrame data.set_index('Datetime', inplace=True) # Extract relevant features for prediction and target variable X = data[['Temperature', 'Humidity', 'WindSpeed', 'GeneralDiffuseFlows', 'DiffuseFlows']].values # Calculate the average power consumption across the three zones as the target variable y = data[['PowerConsumption_Zone1', 'PowerConsumption_Zone2', 'PowerConsumption_Zone3']].mean(axis=1).values # Convert features and target to PyTorch tensors X = torch.tensor(X, dtype=torch.float32) y = torch.tensor(y, dtype=torch.float32) # Create sequences of the specified length for input features num_samples = X.shape[0] - seq_length + 1 X_seq = torch.stack([X[i:i + seq_length] for i in range(num_samples)]) y_seq = y[seq_length - 1:] # Align target variable with sequences return X_seq, y_seq Once we have the preprocessed data, we proceed to create ``PyTorch`` datasets and ``DataLoaders``. Here, we load the power consumption data from the specified CSV file. We create a ``TensorDataset`` to hold the features and labels. To split the dataset into training and validation sets, we use the ``train_test_split`` function from ``sklearn``. .. code:: python # Define the path to the dataset file_path = 'path-to-powerconsumption.csv' # Load the power consumption data with a specified sequence length X, y = load_power_consumption_data(file_path, 20) # Create a TensorDataset for the training data dataset = TensorDataset(X, y) # Split dataset indices into training and validation sets train_indices, val_indices = train_test_split(range(len(dataset)), test_size=0.2, random_state=42) # Create training and validation datasets train_dataset = TensorDataset(X[train_indices], y[train_indices]) val_dataset = TensorDataset(X[val_indices], y[val_indices]) # Create DataLoader for batch processing during training train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True) val_dataloader = DataLoader(val_dataset, batch_size=32, shuffle=False) We will now define a multivariate time series regression model by creating a class that inherits from ``modlee.model.TimeseriesRegressionModleeModel``. This class uses a Transformer-based architecture to predict a continuous value. We initialize a ``TransformerEncoder`` with multi-head attention to process sequential dependencies. .. code:: python class TransformerTimeSeriesRegressor(modlee.model.TimeseriesRegressionModleeModel): def __init__(self, input_dim, seq_length, num_heads=1, hidden_dim=64): super().__init__() # Initialize a Transformer encoder layer with specified input dimensions and heads self.encoder_layer = torch.nn.TransformerEncoderLayer(d_model=input_dim, nhead=num_heads) # Stack encoder layers to form the Transformer encoder self.transformer_encoder = torch.nn.TransformerEncoder(self.encoder_layer, num_layers=2) # Define a fully connected layer to map encoded features to a single output value self.fc = torch.nn.Linear(input_dim * seq_length, 1) # Set the loss function to mean squared error for regression tasks self.loss_fn = torch.nn.MSELoss() def forward(self, x): # Pass input through the Transformer encoder x = self.transformer_encoder(x) # Flatten the output and pass it through the fully connected layer x = x.view(x.size(0), -1) x = self.fc(x) return x def training_step(self, batch): # Get input and target from batch x, y = batch # Generate predictions and compute loss preds = self.forward(x) loss = self.loss_fn(preds, y) return loss def validation_step(self, batch): # Get input and target from batch x, y = batch # Generate predictions and compute loss preds = self.forward(x) loss = self.loss_fn(preds, y) return loss def configure_optimizers(self): # Use the Adam optimizer with a learning rate of 1e-3 return torch.optim.Adam(self.parameters(), lr=1e-3) model = TransformerTimeSeriesRegressor(input_dim=5, seq_length=20) With our model defined, we can now train it using the ``PyTorch Lightning Trainer``. This trainer simplifies the training process by managing the training loops and logging. .. code:: python # Start a training run with Modlee with modlee.start_run() as run: trainer = pl.Trainer(max_epochs=1) # Set up the trainer trainer.fit( model=model, train_dataloaders=train_dataloader, # Load training data val_dataloaders=val_dataloader # Load validation data ) After training, we inspect the artifacts saved by Modlee, including the model graph and various statistics. With Modlee, your training assets are automatically saved, preserving valuable insights for future reference and collaboration. .. code:: python last_run_path = modlee.last_run_path() print(f"Run path: {last_run_path}") artifacts_path = os.path.join(last_run_path, 'artifacts') artifacts = sorted(os.listdir(artifacts_path)) print(f"Saved artifacts: {artifacts}") .. |Open in Kaggle| image:: https://kaggle.com/static/images/open-in-kaggle.svg :target: https://www.kaggle.com/code/modlee/time-series-regression