|image1| .. |image1| image:: https://github.com/mansiagr4/gifs/raw/main/new_small_logo.svg Time Series Forecasting ======================= This examples uses the ``modlee`` package for time series forecasting. We’ll use the ``Air Passengers`` dataset to show you how to: 1. Prepare the Data: Load and preprocess the dataset, including scaling and splitting into training and test sets. 2. Use Modlee for Model Training: Train a model using Modlee’s framework. 3. Evaluate Model: Assess the performance of the trained model on the test data. Note: currently `TimeseriesForecastingModleeModel` does not support PyTorch Recurrent Layers. |Open in Kaggle| First, we will import the the necessary libraries and set up the environment. .. code:: python import torch import os import modlee import lightning.pytorch as pl from torch.utils.data import DataLoader, TensorDataset from sklearn.preprocessing import MinMaxScaler from sklearn.model_selection import train_test_split import numpy as np import seaborn as sns Now, we will set up the ``modlee`` API key and initialize the ``modlee`` package. You can access your ``modlee`` API key `from the dashboard `__. Replace ``replace-with-your-api-key`` with your API key. .. code:: python os.environ['MODLEE_API_KEY'] = "replace-with-your-api-key" modlee.init(api_key=os.environ['MODLEE_API_KEY']) Next, we will prepare and load our data. We will use the ``Air Passengers`` dataset, which contains the monthly number of airline passengers over several years. We can load this dataset using the ``seaborn`` library. .. code:: python data = sns.load_dataset('flights') Now, we will prepare our data for training. We create a function called ``prepare_air_passenger_data`` to handle this process. .. code:: python def prepare_air_passenger_data(data, seq_length): # Convert the 'passengers' column to a float array passenger_data = data['passengers'].values.astype(float) # Scale the data to the range [0, 1] scaler = MinMaxScaler(feature_range=(0, 1)) passenger_data = scaler.fit_transform(passenger_data.reshape(-1, 1)) X, y = [], [] # Create sequences of length seq_length for forecasting for i in range(len(passenger_data) - seq_length): X.append(passenger_data[i:i + seq_length]) # Append the sequence y.append(passenger_data[i + seq_length]) # Append the target value X = np.array(X) y = np.array(y) return X, y, scaler # Return the sequences, target values, and scaler We will call the ``prepare_air_passenger_data`` function and also split the dataset into training and testing sets. .. code:: python # Set the sequence length to 12, indicating we will use the past 12 months of data seq_length = 12 # Prepare the data by calling the function to get the sequences and target values X, y, scaler = prepare_air_passenger_data(data, seq_length) # Reshape X to match the input shape required for the MLP: (batch_size, seq_length, input_dim) X = X.reshape(-1, seq_length, 1) y = y.reshape(-1, 1) # Split the data into training and test sets (80% training, 20% testing) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Convert the training and test sets to PyTorch tensors X_train, y_train = torch.Tensor(X_train), torch.Tensor(y_train) X_test, y_test = torch.Tensor(X_test), torch.Tensor(y_test) Now, we can create our dataloaders for both the training and testing datasets. We will use the ``TensorDataset`` class from ``PyTorch`` to create datasets from the training and testing tensors. .. code:: python # Create TensorDataset for training and test data train_dataset = TensorDataset(X_train, y_train) test_dataset = TensorDataset(X_test, y_test) # Create DataLoader instances for training and validation train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True) test_dataloader = DataLoader(test_dataset, batch_size=32, shuffle=False) After preparing the data, we need to define our model. We will create a simple Multi-Layer Perceptron (MLP) model for time series forecasting using Modlee’s framework. .. code:: python # Define the MLP model for time series forecasting, inheriting from Modlee's model class TimeSeriesForecasterMLP(modlee.model.TimeseriesForecastingModleeModel): def __init__(self, input_dim, seq_length, hidden_dim=64): super().__init__() self.seq_length = seq_length # Store the sequence length self.hidden_dim = hidden_dim # Set the number of hidden units # Define the layers of the MLP self.fc1 = torch.nn.Linear(input_dim * seq_length, hidden_dim) # First hidden layer self.fc2 = torch.nn.Linear(hidden_dim, hidden_dim) # Second hidden layer self.fc3 = torch.nn.Linear(hidden_dim, 1) # Output layer # Define the loss function (Mean Squared Error) self.loss_fn = torch.nn.MSELoss() def forward(self, x): x = x.view(x.size(0), -1) # Flatten the input for MLP x = torch.relu(self.fc1(x)) # Apply ReLU activation after first layer x = torch.relu(self.fc2(x)) # Apply ReLU activation after second layer predictions = self.fc3(x) # Generate predictions from the output layer return predictions def training_step(self, batch): x, y = batch # Unpack the input features and targets preds = self.forward(x) # Forward pass to get predictions loss = self.loss_fn(preds, y) # Calculate loss return loss def validation_step(self, batch): x, y = batch # Unpack the input features and targets preds = self.forward(x) # Forward pass to get predictions loss = self.loss_fn(preds, y) # Calculate validation loss return loss def configure_optimizers(self): # Use Adam optimizer for model parameters return torch.optim.Adam(self.parameters(), lr=1e-3) Now, we can proceed to train our model using the Modlee package. We create an instance of our ``TimeSeriesForecasterMLP`` model and then set up the training loop using the ``Trainer`` class from ``PyTorch Lightning``. .. code:: python input_dim = 1 # We have one feature (number of passengers) # Initialize the Modlee model model = TimeSeriesForecasterMLP(input_dim, seq_length) # Train the model using PyTorch Lightning with modlee.start_run() as run: trainer = pl.Trainer(max_epochs=1) trainer.fit( model=model, train_dataloaders=train_dataloader, val_dataloaders=test_dataloader ) Finally, we inspect the artifacts saved by Modlee, including the model graph and various statistics. With Modlee, your training assets are automatically saved, preserving valuable insights for future reference and collaboration. .. code:: python last_run_path = modlee.last_run_path() print(f"Run path: {last_run_path}") artifacts_path = os.path.join(last_run_path, 'artifacts') artifacts = sorted(os.listdir(artifacts_path)) print(f"Saved artifacts: {artifacts}") .. |Open in Kaggle| image:: https://kaggle.com/static/images/open-in-kaggle.svg :target: https://www.kaggle.com/code/modlee/modlee-time-series-forecasting