modlee.data_metafeatures module
- class modlee.data_metafeatures.DataMetafeatures(dataloader, num_sample=1000, testing=False)[source]
Bases:
object
An object to hold metafeatures for a dataset, as loaded from a dataloader.
- get_features()[source]
Get features for batch elements.
- Returns:
A dictionary of {‘batch_element’ : features}
- get_mfe()[source]
Get PyMFE features for every element in the dataloader
- Returns:
A dictionary of {mfe_feature : mfe_value}
- get_mfe_features()[source]
Get features for all batch elements with PyMFE.
- Returns:
A list of metafeatures.
- get_mfe_on_batch(batch_element)[source]
Get features for a batch element with PyMFE.
- Parameters:
batch_element – The batch element to calculate.
- Returns:
A dictionary of features for the batch element.
- get_mfe_on_element(batch_element)
Get features for a batch element with PyMFE.
- Parameters:
batch_element – The batch element to calculate.
- Returns:
A dictionary of features for the batch element.
- get_raw_batch_elements()[source]
Convert features to a list of dictionaries.
- Returns:
A list of {‘raw’: feature}
- get_stats()[source]
Get statistical features for batch elements. Includes feature shape, k-means clustering, and time taken to calculate features.
- Returns:
A list of statistical features.
- get_stats_rep()
Get features for batch elements.
- Returns:
A dictionary of {‘batch_element’ : features}
- class modlee.data_metafeatures.ImageDataMetafeatures(dataloader, embd_model=None, *args, **kwargs)[source]
Bases:
DataMetafeatures
Image-based DataMetafeatures.
- class modlee.data_metafeatures.TextDataMetafeatures(dataloader, nlp_model=None, *args, **kwargs)[source]
Bases:
DataMetafeatures
- modlee.data_metafeatures.bench_kmeans_unsupervised(batch, n_clusters=[2, 4, 8, 16, 32], testing=False)[source]
Calculate k-means clusters for a batch of data.
- Parameters:
batch – The batch of data.
n_clusters – Number of clusters to calculate, defaults to [2, 4, 8, 16, 32],
testing – Flag for testing and calculating with a smaller batch, defaults to False,
- Returns:
A dictionary of {‘kmeans’:calculated_kmeans_clusters}
- modlee.data_metafeatures.extract_features_from_model(model, batch)[source]
Extract features for a data batch using a neural network model.
- Parameters:
model – The model to use for feature extraction.
batch – The data batch on which to calculate features.
- Returns:
The calculated features.
- modlee.data_metafeatures.get_image_features(x, testing=False)[source]
Get features for a batch of image data.
- Parameters:
x – The batch of image data.
testing – Flag to calculate on a smaller test subsample of the data, defaults to False.
- Returns:
A dictionary of the features.
- modlee.data_metafeatures.get_n_samples(dataloader, n_samples=100)[source]
Get a number of samples from a dataloader
- Parameters:
dataloader – The dataloader.
n_samples – The number of samples, defaults to 100.
- Returns:
An iterable of batch elements, each of length n_samples.
- modlee.data_metafeatures.manipulate_x_1(x)[source]
Unsqueeze a 1D tensor.
- Parameters:
x – The tensor.
- Returns:
The tensor with an extra beginning dimension.
- modlee.data_metafeatures.manipulate_x_2(x)[source]
Subsample a 2D tensor to the first 10000 values.
- Parameters:
x – The tensor to subsample.
- Returns:
A subsample of the tensor.
- modlee.data_metafeatures.manipulate_x_3(x)[source]
Process a 3-dimensional tensor [batch_size, width, height] by resizing to a fixed size.
- Parameters:
x – The tensor.
- Returns:
The tensor, resized.
- modlee.data_metafeatures.manipulate_x_4(x)[source]
Process a 4-dimensional tensor, assumed to be image-like [batch_size, channelw, width, height], into subchannels
- Parameters:
x – The image to process.
- Returns:
Sampled channels from the image.
- modlee.data_metafeatures.manipulate_x_5(x)[source]
Process a 5-dimensional tensor, assumed to be video-like [batch_size, frames, channels, width, height], into image-like [batch_size, channels, width, height].
- Parameters:
x – The tensor.
- Returns:
A subsample of the tesnor
- modlee.data_metafeatures.pad_image_channels(x, desired_channels=3)[source]
Pad an image with extra channels. Uses dimeension order [batch, channel, width, height].
- Parameters:
x – The image tensor to pad.
desired_channels – Desired number of channels, defaults to 3.
- Returns:
The padded tensor.
- modlee.data_metafeatures.sample_dataloader(train_dataloader, num_sample)[source]
Sample batches from a dataloader.
- Parameters:
train_dataloader – The dataloader to sample from.
num_sample – The number of samples.
- Returns:
A tuple of dataset_size, batch_elements, and the original size of the batch.