Embed

embed(data, input_col, encoder='clip', modality=None, out_col=None, device='cpu', mmap_dir=None, num_workers=4, batch_size=128, **kwargs)[source]

Embed a column of data with an encoder from the encoder registry.

Examples

Suppose you have an Image dataset (e.g. Imagenette, CIFAR-10) loaded into a Meerkat DataPanel. You can embed the images in the dataset with CLIP using a code snippet like:

import meerkat as mk
from domino import embed

dp = mk.datasets.get("imagenette")

dp = embed(
    data=dp,
    input_col="img",
    encoder="clip"
)

Parameters

data (mk.DataPanel) – A DataPanel containing the data to embed.
input_col (str) – The name of the column to embed.
encoder (Union[str, Encoder], optional) – Name of the encoder to use. List supported encoders with domino.encoders. Defaults to “clip”. Alternatively, pass an Encoder object containing a custom encoder.
modality (str, optional) – The modality of the data to be embedded. Defaults to None, in which case the modality is inferred from the type of the input column.
out_col (str, optional) – The name of the column where the embeddings are stored. Defaults to None, in which case it is "{encoder}({input_col})".
device (Union[int, str], optional) – The device on which. Defaults to “cpu”.
mmap_dir (str, optional) – The path to directory where a memory-mapped file containing the embeddings will be written. Defaults to None, in which case the embeddings are not memmapped.
num_workers (int, optional) – Number of worker processes used to load the data from disk. Defaults to 4.
batch_size (int, optional) – Size of the batches to used . Defaults to 128.
**kwargs – Additional keyword arguments are passed to the encoder. To see supported arguments for each encoder, see the encoder documentation (e.g. clip()).

Returns

A view of data with a new column containing the embeddings. This column will be named according to the out_col parameter.

Return type

mk.DataPanel

Encoders

Note

Domino supports a growing number of encoders. To list the encoders currently included in the registry use:

In [1]: import domino

In [2]: print(domino.encoders)
Registry of encoders:
╒══════════════╤═════════════════════════════════════════════════════════════════════════════════════════╕
│ clip         │ Contrastive Language-Image Pre-training (CLIP) encoders [radford_2021]_. Includes       │
│              │     encoders for the following modalities:                                              │
│              │                                                                                         │
│              │     - "text"                                                                            │
│              │     - "image"                                                                           │
│              │                                                                                         │
│              │     Encoders will map these different modalities to the same embedding space.           │
│              │                                                                                         │
│              │     Args:                                                                               │
│              │         variant (str, optional): A model name listed by `clip.available_models()`, or   │
│              │             the path to a model checkpoint containing the state_dict. Defaults to       │
│              │             "ViT-B/32".                                                                 │
│              │         device (Union[int, str], optional): The device on which the encoders will be    │
│              │             loaded. Defaults to "cpu".                                                  │
│              │                                                                                         │
│              │                                                                                         │
│              │     .. [radford_2021]                                                                   │
│              │                                                                                         │
│              │         Radford, A. et al. Learning Transferable Visual Models From Natural Language    │
│              │         Supervision. arXiv [cs.CV] (2021)                                               │
├──────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤
│ bit          │ Big Transfer (BiT) encoders [kolesnivok_2019]_. Includes encoders for the               │
│              │     following modalities:                                                               │
│              │         - "image"                                                                       │
│              │                                                                                         │
│              │     Args:                                                                               │
│              │         variant (str): The variant of the model to use. Variants include                │
│              │             "BiT-M-R50x1",  "BiT-M-R101x3", "Bit-M-R152x4".  Defaults to "BiT-M-R50x1". │
│              │         device (Union[int, str], optional): The device on which the encoders will be    │
│              │             loaded. Defaults to "cpu".                                                  │
│              │         reduction (str, optional): The reduction function used to reduce image          │
│              │             embeddings of shape (batch x height x width x dimensions) to (batch x       │
│              │             dimensions). Defaults to "mean". Other options include "max".               │
│              │         layer (str, optional): The layer of the model from which the embeddings will    │
│              │             beto extract the embeddings from. Defaults to "body".                       │
│              │                                                                                         │
│              │     .. [kolesnivok_2019]                                                                │
│              │                                                                                         │
│              │         Kolesnikov, A. et al. Big Transfer (BiT): General Visual Representation         │
│              │         Learning. arXiv [cs.CV] (2019)                                                  │
├──────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤
│ robust       │ Image classifier trained with adversarial robustness loss [engstrom_2019]_.             │
│              │                                                                                         │
│              │     Args:                                                                               │
│              │         variant (str, optional): One of ["imagenet_l2_3_0", "cifar_l2_1_0",             │
│              │             "imagenet_linf_8"].Defaults to "imagenet_l2_3_0".                           │
│              │         device (Union[int, str], optional): The device on which the encoders will be    │
│              │             loaded. Defaults to "cpu".                                                  │
│              │                                                                                         │
│              │                                                                                         │
│              │     .. [engstrom_2019]                                                                  │
│              │                                                                                         │
│              │        @misc{robustness,                                                                │
│              │             title={Robustness (Python Library)},                                        │
│              │             author={Logan Engstrom and Andrew Ilyas and Hadi Salman and Shibani         │
│              │             Santurkar and Dimitris Tsipras},                                            │
│              │             year={2019},                                                                │
│              │             url={https://github.com/MadryLab/robustness}                                │
│              │         }                                                                               │
├──────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤
│ transformers │ Transformer encoders                                                                    │
│              │                                                                                         │
│              │     - "text"                                                                            │
│              │                                                                                         │
│              │     Encoders will map these different modalities to the same embedding space.           │
│              │                                                                                         │
│              │     Args:                                                                               │
│              │         variant (str, optional): A model name listed by `clip.available_models()`, or   │
│              │             the path to a model checkpoint containing the state_dict. Defaults to       │
│              │             "ViT-B/32".                                                                 │
│              │         device (Union[int, str], optional): The device on which the encoders will be    │
│              │             loaded. Defaults to "cpu".                                                  │
╘══════════════╧═════════════════════════════════════════════════════════════════════════════════════════╛

clip(variant='ViT-B/32', device='cpu')[source]

Contrastive Language-Image Pre-training (CLIP) encoders [radford_2021]. Includes encoders for the following modalities:

“text”
“image”

Encoders will map these different modalities to the same embedding space.

Parameters

variant (str, optional) – A model name listed by clip.available_models(), or the path to a model checkpoint containing the state_dict. Defaults to “ViT-B/32”.
device (Union[int, str], optional) – The device on which the encoders will be loaded. Defaults to “cpu”.

Return type

Dict[str, domino._embed.encoder.Encoder]

radford_2021: Radford, A. et al. Learning Transferable Visual Models From Natural Language Supervision. arXiv [cs.CV] (2021)

bit(variant='BiT-M-R50x1', device='cpu', reduction='mean', layer='body')[source]

Big Transfer (BiT) encoders [kolesnivok_2019]. Includes encoders for the following modalities:

“image”

Parameters

variant (str) – The variant of the model to use. Variants include “BiT-M-R50x1”, “BiT-M-R101x3”, “Bit-M-R152x4”. Defaults to “BiT-M-R50x1”.
device (Union[int, str], optional) – The device on which the encoders will be loaded. Defaults to “cpu”.
reduction (str, optional) – The reduction function used to reduce image embeddings of shape (batch x height x width x dimensions) to (batch x dimensions). Defaults to “mean”. Other options include “max”.
layer (str, optional) – The layer of the model from which the embeddings will beto extract the embeddings from. Defaults to “body”.

Return type

Dict[str, domino._embed.encoder.Encoder]

kolesnivok_2019: Kolesnikov, A. et al. Big Transfer (BiT): General Visual Representation Learning. arXiv [cs.CV] (2019)