Slice

This page includes an API reference for the slicers provided by domino. Recall that most slice discovey methods adhere to a three-step procedure: (1) embed, (2) slice, and (3) describe. In this second step, we search an embedding space for regions where a model underperforms. Algorithms that can perform this step are called slicers and, in domino, are subclasses of the abstract Slicer. For example, the SpotlightSlicer, directly optimizes the parameters of Gaussian kernel to highlight regions with a high concentration of errors [deon_2022].

All slicers in Domino share a common, sklearn-esque API. They each implement three methods: fit(), predict(), and predict_proba().

  • fit() Learn a set of slicing functions that partition the embedding space into regions with high error rates.

  • predict() Apply the learned slicing functions to data, assigning each datapoint to zero or more slices.

  • predict_proba() Apply the learned slicing functions to data, producing “soft” slice assignemnts.

All three methods accept embeddings, targets, pred_probs, and losses. There are two ways to use these arguments:

  1. By passing NumPy arrays directly.

  2. By passing a Meerkat DataPanel to the data argument and string column names to embeddings, targets, pred_probs, and losses.

Note that not all slicers require all arguments. For example, the DominoSlicer requires the embeddings, target, and pred_probs arguments for fit(), but only embeddings is required for predict() and predict_proba().

Consider this simple example where we fit() the DominoSlicer on the validation set and apply predict() to the test set.

from domino import DominoSlicer
dp = ...  # load a dataset with columns "emb", "target" and "pred_probs" into a Meerkat DataPanel

# split dataset
valid_dp = dp.lz[dp["split"] == "valid"]
test_dp = dp.lz[dp["split"] == "test"]

domino = DominoSlicer()
domino.fit(
    data=valid_dp, embeddings="emb", targets="target", pred_probs="pred_probs"
)
dp["domino_slices"] = domino.predict(
    data=test_dp, embeddings="emb",
)

Slicers can be configured by passing parameters to the constructor. Each slicer has a different set of parameters; for example, the DominoSlicer has a parameter called max_iter which controls the maximum number of EM iterations. See the documentation below for the parameters of each slicer.

To access these parameters from a slicer, users can use get_params(), which returns a dictionary mapping parameter names (as defined in the constructor) to values.

Abstract Base Class: Slicer

class Slicer(n_slices)[source]
Parameters

n_slices (int) –

abstract fit(model=None, data_dp=None)[source]

Fit the slicer to data.

Parameters
  • data (mk.DataPanel, optional) – A Meerkat DataPanel with columns for embeddings, targets, and prediction probabilities. The names of the columns can be specified with the embeddings, targets, and pred_probs arguments. Defaults to None.

  • embeddings (Union[str, np.ndarray], optional) – The name of a column in data holding embeddings. If data is None, then an np.ndarray of shape (n_samples, dimension of embedding). Defaults to “embedding”.

  • targets (Union[str, np.ndarray], optional) – The name of a column in data holding class labels. If data is None, then an np.ndarray of shape (n_samples,). Defaults to “target”.

  • pred_probs (Union[str, np.ndarray], optional) – The name of a column in data holding model predictions (can either be “soft” probability scores or “hard” 1-hot encoded predictions). If data is None, then an np.ndarray of shape (n_samples, n_classes) or (n_samples,) in the binary case. Defaults to “pred_probs”.

  • losses (Union[str, np.ndarray], optional) – The name of a column in data holding the loss of the model predictions. If data is None, then an np.ndarray of shape (n_samples,). Defaults to “loss”.

  • model (Optional[torch.nn.modules.module.Module]) –

  • data_dp (Optional[meerkat.datapanel.DataPanel]) –

Returns

Returns a fit instance of the slicer.

Return type

Slicer

get_params()[source]

Get the parameters of this slicer. Returns a dictionary mapping from the names of the parameters (as they are defined in the __init__) to their values.

Returns

A dictionary of parameters.

Return type

Dict[str, Any]

abstract predict(data, embeddings='embedding', targets='target', pred_probs='pred_probs')[source]

Get slice membership for data using the fit slicer.

Caution

Must call Slicer.fit prior to calling Slicer.predict.

Parameters
  • data (mk.DataPanel, optional) – A Meerkat DataPanel with columns for embeddings, targets, and prediction probabilities. The names of the columns can be specified with the embeddings, targets, and pred_probs arguments. Defaults to None.

  • embeddings (Union[str, np.ndarray], optional) – The name of a colum in data holding embeddings. If data is None, then an np.ndarray of shape (n_samples, dimension of embedding). Defaults to “embedding”.

  • targets (Union[str, np.ndarray], optional) – The name of a column in data holding class labels. If data is None, then an np.ndarray of shape (n_samples,). Defaults to “target”.

  • pred_probs (Union[str, np.ndarray], optional) – The name of a column in data holding model predictions (can either be “soft” probability scores or “hard” 1-hot encoded predictions). If data is None, then an np.ndarray of shape (n_samples, n_classes) or (n_samples,) in the binary case. Defaults to “pred_probs”.

  • losses (Union[str, np.ndarray], optional) – The name of a column in data holding the loss of the model predictions. If data is None, then an np.ndarray of shape (n_samples,). Defaults to “loss”.

Returns

A binary np.ndarray of shape (n_samples, n_slices) where

values are either 1 or 0.

Return type

np.ndarray

abstract predict_proba(data, embeddings='embedding', targets='target', pred_probs='pred_probs')[source]

Get probablisitic (i.e. soft) slice membership for data using the fit slicer.

Caution

Must call Slicer.fit prior to calling Slicer.predict.

Parameters
  • data (mk.DataPanel, optional) – A Meerkat DataPanel with columns for embeddings, targets, and prediction probabilities. The names of the columns can be specified with the embeddings, targets, and pred_probs arguments. Defaults to None.

  • embeddings (Union[str, np.ndarray], optional) – The name of a colum in data holding embeddings. If data is None, then an np.ndarray of shape (n_samples, dimension of embedding). Defaults to “embedding”.

  • targets (Union[str, np.ndarray], optional) – The name of a column in data holding class labels. If data is None, then an np.ndarray of shape (n_samples,). Defaults to “target”.

  • pred_probs (Union[str, np.ndarray], optional) – The name of a column in data holding model predictions (can either be “soft” probability scores or “hard” 1-hot encoded predictions). If data is None, then an np.ndarray of shape (n_samples, n_classes) or (n_samples,) in the binary case. Defaults to “pred_probs”.

  • losses (Union[str, np.ndarray], optional) – The name of a column in data holding the loss of the model predictions. If data is None, then an np.ndarray of shape (n_samples,). Defaults to “loss”.

Returns

A binary np.ndarray of shape (n_samples, n_slices) where

values are either 1 or 0.

Return type

np.ndarray

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

DominoSlicer

DominoSlicer

alias of domino._slice.mixture.MixtureSlicer

SpotlightSlicer

class SpotlightSlicer(n_slices=5, spotlight_size=0.02, n_steps=1000, learning_rate=0.001, device=device(type='cpu'), pbar=False)[source]

Slice a dataset with The Spotlight algorithm [deon_2022].

TODO: add docstring similar to the Domino one

deon_2022(1,2)

d’Eon, G., d’Eon, J., Wright, J. R. & Leyton-Brown, K. The Spotlight: A General Method for Discovering Systematic Errors in Deep Learning Models. arXiv:2107. 00758 [cs, stat] (2021)

Parameters
  • n_slices (int) –

  • spotlight_size (int) –

  • n_steps (int) –

  • learning_rate (float) –

  • device (torch.device) –

  • pbar (bool) –

MultiaccuracySlicer

class MultiaccuracySlicer(n_slices=5, eta=0.1, dev_valid_frac=0.1, partition_size_threshold=10, pbar=False)[source]

Slice discovery based on MultiAccuracy auditing [kim_2019].

Discover slices by learning a simple function (e.g. ridge regression) that correlates with the residual.

Examples

Suppose you’ve trained a model and stored its predictions on a dataset in a Meerkat DataPanel with columns “emb”, “target”, and “pred_probs”. After loading the DataPanel, you can discover underperforming slices of the validation dataset with the following:

from domino import MultiaccuracySlicer
dp = ...  # Load dataset into a Meerkat DataPanel

# split dataset
valid_dp = dp.lz[dp["split"] == "valid"]
test_dp = dp.lz[dp["split"] == "test"]

slicer = MultiaccuracySlicer()
slicer.fit(
    data=valid_dp, embeddings="emb", targets="target", pred_probs="pred_probs"
)
dp["slicer"] = slicer.predict(
    data=test_dp, embeddings="emb", targets="target", pred_probs="pred_probs"
)
Parameters
  • n_slices (int, optional) – The number of slices to discover. Defaults to 5.

  • eta (float, optional) – Step size for the logits update, see final line Algorithm 1 in . Defaults to 0.1

  • dev_valid_frac (float, optional) – The fraction of data held out for computing corr. Defaults to 0.3.

  • partition_size_threshold (int) –

  • pbar (bool) –

kim_2019
@inproceedings{kim2019multiaccuracy,

title={Multiaccuracy: Black-box post-processing for fairness in classification}, author={Kim, Michael P and Ghorbani, Amirata and Zou, James}, booktitle={Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society}, pages={247–254}, year={2019}

}

BarlowSlicer

class BarlowSlicer(n_slices=5, max_depth=3, n_features=128, pbar=True)[source]

Slice Discovery based on the Barlow [singla_2021].

Discover slices using a decision tree. TODO(singlasahil14): add any more details describing your method

Examples

Suppose you’ve trained a model and stored its predictions on a dataset in a Meerkat DataPanel with columns “emb”, “target”, and “pred_probs”. After loading the DataPanel, you can discover underperforming slices of the validation dataset with the following:

from domino import BarlowSlicer
dp = ...  # Load dataset into a Meerkat DataPanel

# split dataset
valid_dp = dp.lz[dp["split"] == "valid"]
test_dp = dp.lz[dp["split"] == "test"]

barlow = BarlowSlicer()
barlow.fit(
    data=valid_dp, embeddings="emb", targets="target", pred_probs="pred_probs"
)
dp["barlow_slices"] = barlow.transform(
    data=test_dp, embeddings="emb", targets="target", pred_probs="pred_probs"
)
Parameters
  • n_slices (int, optional) – The number of slices to discover. Defaults to 5.

  • max_depth (str, optional) – The maximum depth of the desicion tree. Defaults to 3. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than 2 samples. See SKlearn documentation for more information.

  • n_features (int, optional) – The number features from the embedding to use. Defaults to 128. Features are selcted using mutual information estimate.

  • pbar (bool, optional) – Whether to show a progress bar. Ignored for barlow.

singla_2021

Singla, Sahil, et al. “Understanding failures of deep networks via robust feature extraction.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.

engstrom_2019
@misc{robustness,

title={Robustness (Python Library)}, author={Logan Engstrom and Andrew Ilyas and Hadi Salman and Shibani Santurkar and Dimitris Tsipras}, year={2019}, url={https://github.com/MadryLab/robustness}

}