|image0| Dataset guidelines ================== Here we show pseudo code to illustrate building a pytorch data loader from a list of data elements in a format that is compatible with **Modlee Auto Experiment Documentation** TLDR ---- - Define your dataset in an unnested format: [[x1, x2, x3, …, y], …] - Create a dataloader which is used to train a ModleeModel with a Modlee Trainer Define example custom dataset objects ------------------------------------- .. |image0| image:: https://github.com/mansiagr4/gifs/raw/main/logo%20only%20(2).svg :width: 50px :height: 50px .. code:: python import torch import numpy as np from torch.utils.data import Dataset, DataLoader class CustomDataset(Dataset): def __init__(self, data): self.data = data def __len__(self): return len(self.data) def __getitem__(self, idx): feature1 = torch.tensor(self.data[idx][0], dtype=torch.float32) feature2 = torch.tensor(self.data[idx][1], dtype=torch.float32) feature3 = torch.tensor(self.data[idx][2], dtype=torch.float32) features = [feature1,feature2,feature3] # This is a simplification target = torch.tensor(self.data[idx][-1], dtype=torch.float32).squeeze() # Ensure target is a scalar or 1D return features, target def example_text(): return np.random.rand(10) # 1D array of 10 random numbers def example_image(): return np.random.rand(5, 3) # 2D array of shape (5, 3) with random numbers def example_video(): return np.random.rand(5, 3, 2) # 3D array of shape (5, 3, 2) with random numbers def example_target(): return np.random.rand(1) # scalar value Create dataset and dataloader ----------------------------- MODLEE_GUIDELINE ~~~~~~~~~~~~~~~~ Define your raw data so that each element is a list of data objects (any combination of images,audio,text,video,etc …) with the final element of the list being your target which must match the output shape of your neural network - ex: [[x1, x2, x3, …, y], …] Avoid nested data structures like the following - [[[x1, x2], x3, …, y], …] Why? ~~~~ Modlee extracts key meta features from your dataset so your experiment can be used in aggregate analysis alongside your collaborators data, to improve Modlee’s model recommendation technology for your connected environment. The above stated list data structure allows us to easily extract the information we need. Check out exactly how we do this on our public `Github Repo `__. .. code:: python data = [[example_text(),example_image(),example_video(),example_target()] for _ in range(4)] dataset = CustomDataset(data) Define a PyTorch DataLoader --------------------------- MODLEE_GUIDELINE ~~~~~~~~~~~~~~~~ Pass your dataset to a PyTorch DataLoader, so that Modlee can automatically parse it for meta features, allowing you to share it in a meaningful way with your colleagues. .. code:: python dataloader = DataLoader(dataset, batch_size=2, shuffle=True) # Iterate through dataloader for i,batch in enumerate(dataloader): print(f"- batch_{i}") features, target = batch for j,feature in enumerate(features): print(f"feature_{j}.shape = ", feature.shape) print("target.shape = ", target.shape) Modality & task compatibility ----------------------------- We’re working on making modlee compatible with any data modality and machine learning task which drove us to create the above stated MODLEE_GUIDELINES. Check out our `Github Repo `__ to see which have been tested for auto documentation to date, and if you don’t see one you need, test it out yourself and contribute! Reach out on our `Discord `__ to let us know what modality & tasks you want to use next, or give us feedback on these guidelines.