Slice
This page includes an API reference for the slicers provided by domino. Recall that most slice discovey methods adhere to a three-step procedure: (1) embed, (2) slice, and (3) describe. In this second step, we search an embedding space for regions where a model underperforms. Algorithms that can perform this step are called slicers and, in domino, are subclasses of the abstract Slicer. For example, the SpotlightSlicer, directly optimizes the parameters of Gaussian kernel to highlight regions with a high concentration of errors [deon_2022].
All slicers in Domino share a common, sklearn-esque API. They each implement three methods: fit(), predict(), and predict_proba().
fit()Learn a set of slicing functions that partition the embedding space into regions with high error rates.predict()Apply the learned slicing functions to data, assigning each datapoint to zero or more slices.predict_proba()Apply the learned slicing functions to data, producing “soft” slice assignemnts.
All three methods accept embeddings, targets, pred_probs, and losses. There are two ways to use these arguments:
By passing NumPy arrays directly.
By passing a Meerkat DataPanel to the
dataargument and string column names toembeddings,targets,pred_probs, andlosses.
Note that not all slicers require all arguments. For example, the DominoSlicer requires the embeddings, target, and pred_probs arguments for fit(), but only embeddings is required for predict() and predict_proba().
Consider this simple example where we fit() the DominoSlicer on the validation set and apply predict() to the test set.
from domino import DominoSlicer
dp = ... # load a dataset with columns "emb", "target" and "pred_probs" into a Meerkat DataPanel
# split dataset
valid_dp = dp.lz[dp["split"] == "valid"]
test_dp = dp.lz[dp["split"] == "test"]
domino = DominoSlicer()
domino.fit(
data=valid_dp, embeddings="emb", targets="target", pred_probs="pred_probs"
)
dp["domino_slices"] = domino.predict(
data=test_dp, embeddings="emb",
)
Slicers can be configured by passing parameters to the constructor. Each slicer has a different set of parameters; for example, the DominoSlicer has a parameter called max_iter which controls the maximum number of EM iterations. See the documentation below for the parameters of each slicer.
To access these parameters from a slicer, users can use get_params(), which returns a dictionary mapping parameter names (as defined in the constructor) to values.
Table of Contents
Abstract Base Class: Slicer
- class Slicer(n_slices)[source]
- Parameters
n_slices (int) –
- abstract fit(model=None, data_dp=None)[source]
Fit the slicer to data.
- Parameters
data (mk.DataPanel, optional) – A Meerkat DataPanel with columns for embeddings, targets, and prediction probabilities. The names of the columns can be specified with the
embeddings,targets, andpred_probsarguments. Defaults to None.embeddings (Union[str, np.ndarray], optional) – The name of a column in
dataholding embeddings. IfdataisNone, then an np.ndarray of shape (n_samples, dimension of embedding). Defaults to “embedding”.targets (Union[str, np.ndarray], optional) – The name of a column in
dataholding class labels. IfdataisNone, then an np.ndarray of shape (n_samples,). Defaults to “target”.pred_probs (Union[str, np.ndarray], optional) – The name of a column in
dataholding model predictions (can either be “soft” probability scores or “hard” 1-hot encoded predictions). IfdataisNone, then an np.ndarray of shape (n_samples, n_classes) or (n_samples,) in the binary case. Defaults to “pred_probs”.losses (Union[str, np.ndarray], optional) – The name of a column in
dataholding the loss of the model predictions. IfdataisNone, then an np.ndarray of shape (n_samples,). Defaults to “loss”.model (Optional[torch.nn.modules.module.Module]) –
data_dp (Optional[meerkat.datapanel.DataPanel]) –
- Returns
Returns a fit instance of the slicer.
- Return type
- get_params()[source]
Get the parameters of this slicer. Returns a dictionary mapping from the names of the parameters (as they are defined in the
__init__) to their values.- Returns
A dictionary of parameters.
- Return type
Dict[str, Any]
- abstract predict(data, embeddings='embedding', targets='target', pred_probs='pred_probs')[source]
Get slice membership for data using the fit slicer.
Caution
Must call
Slicer.fitprior to callingSlicer.predict.- Parameters
data (mk.DataPanel, optional) – A Meerkat DataPanel with columns for embeddings, targets, and prediction probabilities. The names of the columns can be specified with the
embeddings,targets, andpred_probsarguments. Defaults to None.embeddings (Union[str, np.ndarray], optional) – The name of a colum in
dataholding embeddings. IfdataisNone, then an np.ndarray of shape (n_samples, dimension of embedding). Defaults to “embedding”.targets (Union[str, np.ndarray], optional) – The name of a column in
dataholding class labels. IfdataisNone, then an np.ndarray of shape (n_samples,). Defaults to “target”.pred_probs (Union[str, np.ndarray], optional) – The name of a column in
dataholding model predictions (can either be “soft” probability scores or “hard” 1-hot encoded predictions). IfdataisNone, then an np.ndarray of shape (n_samples, n_classes) or (n_samples,) in the binary case. Defaults to “pred_probs”.losses (Union[str, np.ndarray], optional) – The name of a column in
dataholding the loss of the model predictions. IfdataisNone, then an np.ndarray of shape (n_samples,). Defaults to “loss”.
- Returns
- A binary
np.ndarrayof shape (n_samples, n_slices) where values are either 1 or 0.
- A binary
- Return type
np.ndarray
- abstract predict_proba(data, embeddings='embedding', targets='target', pred_probs='pred_probs')[source]
Get probablisitic (i.e. soft) slice membership for data using the fit slicer.
Caution
Must call
Slicer.fitprior to callingSlicer.predict.- Parameters
data (mk.DataPanel, optional) – A Meerkat DataPanel with columns for embeddings, targets, and prediction probabilities. The names of the columns can be specified with the
embeddings,targets, andpred_probsarguments. Defaults to None.embeddings (Union[str, np.ndarray], optional) – The name of a colum in
dataholding embeddings. IfdataisNone, then an np.ndarray of shape (n_samples, dimension of embedding). Defaults to “embedding”.targets (Union[str, np.ndarray], optional) – The name of a column in
dataholding class labels. IfdataisNone, then an np.ndarray of shape (n_samples,). Defaults to “target”.pred_probs (Union[str, np.ndarray], optional) – The name of a column in
dataholding model predictions (can either be “soft” probability scores or “hard” 1-hot encoded predictions). IfdataisNone, then an np.ndarray of shape (n_samples, n_classes) or (n_samples,) in the binary case. Defaults to “pred_probs”.losses (Union[str, np.ndarray], optional) – The name of a column in
dataholding the loss of the model predictions. IfdataisNone, then an np.ndarray of shape (n_samples,). Defaults to “loss”.
- Returns
- A binary
np.ndarrayof shape (n_samples, n_slices) where values are either 1 or 0.
- A binary
- Return type
np.ndarray
- set_params(**params)[source]
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
DominoSlicer
- DominoSlicer
alias of
domino._slice.mixture.MixtureSlicer
SpotlightSlicer
- class SpotlightSlicer(n_slices=5, spotlight_size=0.02, n_steps=1000, learning_rate=0.001, device=device(type='cpu'), pbar=False)[source]
Slice a dataset with The Spotlight algorithm [deon_2022].
TODO: add docstring similar to the Domino one
- deon_2022(1,2)
d’Eon, G., d’Eon, J., Wright, J. R. & Leyton-Brown, K. The Spotlight: A General Method for Discovering Systematic Errors in Deep Learning Models. arXiv:2107. 00758 [cs, stat] (2021)
- Parameters
n_slices (int) –
spotlight_size (int) –
n_steps (int) –
learning_rate (float) –
device (torch.device) –
pbar (bool) –
MultiaccuracySlicer
- class MultiaccuracySlicer(n_slices=5, eta=0.1, dev_valid_frac=0.1, partition_size_threshold=10, pbar=False)[source]
Slice discovery based on MultiAccuracy auditing [kim_2019].
Discover slices by learning a simple function (e.g. ridge regression) that correlates with the residual.
Examples
Suppose you’ve trained a model and stored its predictions on a dataset in a Meerkat DataPanel with columns “emb”, “target”, and “pred_probs”. After loading the DataPanel, you can discover underperforming slices of the validation dataset with the following:
from domino import MultiaccuracySlicer dp = ... # Load dataset into a Meerkat DataPanel # split dataset valid_dp = dp.lz[dp["split"] == "valid"] test_dp = dp.lz[dp["split"] == "test"] slicer = MultiaccuracySlicer() slicer.fit( data=valid_dp, embeddings="emb", targets="target", pred_probs="pred_probs" ) dp["slicer"] = slicer.predict( data=test_dp, embeddings="emb", targets="target", pred_probs="pred_probs" )
- Parameters
n_slices (int, optional) – The number of slices to discover. Defaults to 5.
eta (float, optional) – Step size for the logits update, see final line Algorithm 1 in . Defaults to 0.1
dev_valid_frac (float, optional) – The fraction of data held out for computing corr. Defaults to 0.3.
partition_size_threshold (int) –
pbar (bool) –
- kim_2019
- @inproceedings{kim2019multiaccuracy,
title={Multiaccuracy: Black-box post-processing for fairness in classification}, author={Kim, Michael P and Ghorbani, Amirata and Zou, James}, booktitle={Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society}, pages={247–254}, year={2019}
}
BarlowSlicer
- class BarlowSlicer(n_slices=5, max_depth=3, n_features=128, pbar=True)[source]
Slice Discovery based on the Barlow [singla_2021].
Discover slices using a decision tree. TODO(singlasahil14): add any more details describing your method
Examples
Suppose you’ve trained a model and stored its predictions on a dataset in a Meerkat DataPanel with columns “emb”, “target”, and “pred_probs”. After loading the DataPanel, you can discover underperforming slices of the validation dataset with the following:
from domino import BarlowSlicer dp = ... # Load dataset into a Meerkat DataPanel # split dataset valid_dp = dp.lz[dp["split"] == "valid"] test_dp = dp.lz[dp["split"] == "test"] barlow = BarlowSlicer() barlow.fit( data=valid_dp, embeddings="emb", targets="target", pred_probs="pred_probs" ) dp["barlow_slices"] = barlow.transform( data=test_dp, embeddings="emb", targets="target", pred_probs="pred_probs" )
- Parameters
n_slices (int, optional) – The number of slices to discover. Defaults to 5.
max_depth (str, optional) – The maximum depth of the desicion tree. Defaults to 3. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than 2 samples. See SKlearn documentation for more information.
n_features (int, optional) – The number features from the embedding to use. Defaults to 128. Features are selcted using mutual information estimate.
pbar (bool, optional) – Whether to show a progress bar. Ignored for barlow.
- singla_2021
Singla, Sahil, et al. “Understanding failures of deep networks via robust feature extraction.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
- engstrom_2019
- @misc{robustness,
title={Robustness (Python Library)}, author={Logan Engstrom and Andrew Ilyas and Hadi Salman and Shibani Santurkar and Dimitris Tsipras}, year={2019}, url={https://github.com/MadryLab/robustness}
}