Embed
- embed(data, input_col, encoder='clip', modality=None, out_col=None, device='cpu', mmap_dir=None, num_workers=4, batch_size=128, **kwargs)[source]
Embed a column of data with an encoder from the encoder registry.
Examples
Suppose you have an Image dataset (e.g. Imagenette, CIFAR-10) loaded into a Meerkat DataPanel. You can embed the images in the dataset with CLIP using a code snippet like:
import meerkat as mk from domino import embed dp = mk.datasets.get("imagenette") dp = embed( data=dp, input_col="img", encoder="clip" )
- Parameters
data (mk.DataPanel) – A DataPanel containing the data to embed.
input_col (str) – The name of the column to embed.
encoder (Union[str, Encoder], optional) – Name of the encoder to use. List supported encoders with
domino.encoders
. Defaults to “clip”. Alternatively, pass anEncoder
object containing a custom encoder.modality (str, optional) – The modality of the data to be embedded. Defaults to None, in which case the modality is inferred from the type of the input column.
out_col (str, optional) – The name of the column where the embeddings are stored. Defaults to None, in which case it is
"{encoder}({input_col})"
.device (Union[int, str], optional) – The device on which. Defaults to “cpu”.
mmap_dir (str, optional) – The path to directory where a memory-mapped file containing the embeddings will be written. Defaults to None, in which case the embeddings are not memmapped.
num_workers (int, optional) – Number of worker processes used to load the data from disk. Defaults to 4.
batch_size (int, optional) – Size of the batches to used . Defaults to 128.
**kwargs – Additional keyword arguments are passed to the encoder. To see supported arguments for each encoder, see the encoder documentation (e.g.
clip()
).
- Returns
A view of
data
with a new column containing the embeddings. This column will be named according to theout_col
parameter.- Return type
mk.DataPanel
Encoders
Note
Domino supports a growing number of encoders. To list the encoders currently included in the registry use:
In [1]: import domino
In [2]: print(domino.encoders)
Registry of encoders:
╒══════════════╤═════════════════════════════════════════════════════════════════════════════════════════╕
│ clip │ Contrastive Language-Image Pre-training (CLIP) encoders [radford_2021]_. Includes │
│ │ encoders for the following modalities: │
│ │ │
│ │ - "text" │
│ │ - "image" │
│ │ │
│ │ Encoders will map these different modalities to the same embedding space. │
│ │ │
│ │ Args: │
│ │ variant (str, optional): A model name listed by `clip.available_models()`, or │
│ │ the path to a model checkpoint containing the state_dict. Defaults to │
│ │ "ViT-B/32". │
│ │ device (Union[int, str], optional): The device on which the encoders will be │
│ │ loaded. Defaults to "cpu". │
│ │ │
│ │ │
│ │ .. [radford_2021] │
│ │ │
│ │ Radford, A. et al. Learning Transferable Visual Models From Natural Language │
│ │ Supervision. arXiv [cs.CV] (2021) │
├──────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤
│ bit │ Big Transfer (BiT) encoders [kolesnivok_2019]_. Includes encoders for the │
│ │ following modalities: │
│ │ - "image" │
│ │ │
│ │ Args: │
│ │ variant (str): The variant of the model to use. Variants include │
│ │ "BiT-M-R50x1", "BiT-M-R101x3", "Bit-M-R152x4". Defaults to "BiT-M-R50x1". │
│ │ device (Union[int, str], optional): The device on which the encoders will be │
│ │ loaded. Defaults to "cpu". │
│ │ reduction (str, optional): The reduction function used to reduce image │
│ │ embeddings of shape (batch x height x width x dimensions) to (batch x │
│ │ dimensions). Defaults to "mean". Other options include "max". │
│ │ layer (str, optional): The layer of the model from which the embeddings will │
│ │ beto extract the embeddings from. Defaults to "body". │
│ │ │
│ │ .. [kolesnivok_2019] │
│ │ │
│ │ Kolesnikov, A. et al. Big Transfer (BiT): General Visual Representation │
│ │ Learning. arXiv [cs.CV] (2019) │
├──────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤
│ robust │ Image classifier trained with adversarial robustness loss [engstrom_2019]_. │
│ │ │
│ │ Args: │
│ │ variant (str, optional): One of ["imagenet_l2_3_0", "cifar_l2_1_0", │
│ │ "imagenet_linf_8"].Defaults to "imagenet_l2_3_0". │
│ │ device (Union[int, str], optional): The device on which the encoders will be │
│ │ loaded. Defaults to "cpu". │
│ │ │
│ │ │
│ │ .. [engstrom_2019] │
│ │ │
│ │ @misc{robustness, │
│ │ title={Robustness (Python Library)}, │
│ │ author={Logan Engstrom and Andrew Ilyas and Hadi Salman and Shibani │
│ │ Santurkar and Dimitris Tsipras}, │
│ │ year={2019}, │
│ │ url={https://github.com/MadryLab/robustness} │
│ │ } │
├──────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤
│ transformers │ Transformer encoders │
│ │ │
│ │ - "text" │
│ │ │
│ │ Encoders will map these different modalities to the same embedding space. │
│ │ │
│ │ Args: │
│ │ variant (str, optional): A model name listed by `clip.available_models()`, or │
│ │ the path to a model checkpoint containing the state_dict. Defaults to │
│ │ "ViT-B/32". │
│ │ device (Union[int, str], optional): The device on which the encoders will be │
│ │ loaded. Defaults to "cpu". │
╘══════════════╧═════════════════════════════════════════════════════════════════════════════════════════╛
- clip(variant='ViT-B/32', device='cpu')[source]
Contrastive Language-Image Pre-training (CLIP) encoders [radford_2021]. Includes encoders for the following modalities:
“text”
“image”
Encoders will map these different modalities to the same embedding space.
- Parameters
variant (str, optional) – A model name listed by clip.available_models(), or the path to a model checkpoint containing the state_dict. Defaults to “ViT-B/32”.
device (Union[int, str], optional) – The device on which the encoders will be loaded. Defaults to “cpu”.
- Return type
Dict[str, domino._embed.encoder.Encoder]
- radford_2021
Radford, A. et al. Learning Transferable Visual Models From Natural Language Supervision. arXiv [cs.CV] (2021)
- bit(variant='BiT-M-R50x1', device='cpu', reduction='mean', layer='body')[source]
Big Transfer (BiT) encoders [kolesnivok_2019]. Includes encoders for the following modalities:
“image”
- Parameters
variant (str) – The variant of the model to use. Variants include “BiT-M-R50x1”, “BiT-M-R101x3”, “Bit-M-R152x4”. Defaults to “BiT-M-R50x1”.
device (Union[int, str], optional) – The device on which the encoders will be loaded. Defaults to “cpu”.
reduction (str, optional) – The reduction function used to reduce image embeddings of shape (batch x height x width x dimensions) to (batch x dimensions). Defaults to “mean”. Other options include “max”.
layer (str, optional) – The layer of the model from which the embeddings will beto extract the embeddings from. Defaults to “body”.
- Return type
Dict[str, domino._embed.encoder.Encoder]
- kolesnivok_2019
Kolesnikov, A. et al. Big Transfer (BiT): General Visual Representation Learning. arXiv [cs.CV] (2019)