Slice
This page includes an API reference for the slicers provided by domino
. Recall that most slice discovey methods adhere to a three-step procedure: (1) embed, (2) slice, and (3) describe. In this second step, we search an embedding space for regions where a model underperforms. Algorithms that can perform this step are called slicers and, in domino
, are subclasses of the abstract Slicer
. For example, the SpotlightSlicer
, directly optimizes the parameters of Gaussian kernel to highlight regions with a high concentration of errors [deon_2022].
All slicers in Domino share a common, sklearn-esque API. They each implement three methods: fit()
, predict()
, and predict_proba()
.
fit()
Learn a set of slicing functions that partition the embedding space into regions with high error rates.predict()
Apply the learned slicing functions to data, assigning each datapoint to zero or more slices.predict_proba()
Apply the learned slicing functions to data, producing “soft” slice assignemnts.
All three methods accept embeddings
, targets
, pred_probs
, and losses
. There are two ways to use these arguments:
By passing NumPy arrays directly.
By passing a Meerkat DataPanel to the
data
argument and string column names toembeddings
,targets
,pred_probs
, andlosses
.
Note that not all slicers require all arguments. For example, the DominoSlicer
requires the embeddings, target, and pred_probs arguments for fit()
, but only embeddings
is required for predict()
and predict_proba()
.
Consider this simple example where we fit()
the DominoSlicer
on the validation set and apply predict()
to the test set.
from domino import DominoSlicer
dp = ... # load a dataset with columns "emb", "target" and "pred_probs" into a Meerkat DataPanel
# split dataset
valid_dp = dp.lz[dp["split"] == "valid"]
test_dp = dp.lz[dp["split"] == "test"]
domino = DominoSlicer()
domino.fit(
data=valid_dp, embeddings="emb", targets="target", pred_probs="pred_probs"
)
dp["domino_slices"] = domino.predict(
data=test_dp, embeddings="emb",
)
Slicers can be configured by passing parameters to the constructor. Each slicer has a different set of parameters; for example, the DominoSlicer
has a parameter called max_iter
which controls the maximum number of EM iterations. See the documentation below for the parameters of each slicer.
To access these parameters from a slicer, users can use get_params()
, which returns a dictionary mapping parameter names (as defined in the constructor) to values.
Table of Contents
Abstract Base Class: Slicer
- class Slicer(n_slices)[source]
- Parameters
n_slices (int) –
- abstract fit(model=None, data_dp=None)[source]
Fit the slicer to data.
- Parameters
data (mk.DataPanel, optional) – A Meerkat DataPanel with columns for embeddings, targets, and prediction probabilities. The names of the columns can be specified with the
embeddings
,targets
, andpred_probs
arguments. Defaults to None.embeddings (Union[str, np.ndarray], optional) – The name of a column in
data
holding embeddings. Ifdata
isNone
, then an np.ndarray of shape (n_samples, dimension of embedding). Defaults to “embedding”.targets (Union[str, np.ndarray], optional) – The name of a column in
data
holding class labels. Ifdata
isNone
, then an np.ndarray of shape (n_samples,). Defaults to “target”.pred_probs (Union[str, np.ndarray], optional) – The name of a column in
data
holding model predictions (can either be “soft” probability scores or “hard” 1-hot encoded predictions). Ifdata
isNone
, then an np.ndarray of shape (n_samples, n_classes) or (n_samples,) in the binary case. Defaults to “pred_probs”.losses (Union[str, np.ndarray], optional) – The name of a column in
data
holding the loss of the model predictions. Ifdata
isNone
, then an np.ndarray of shape (n_samples,). Defaults to “loss”.model (Optional[torch.nn.modules.module.Module]) –
data_dp (Optional[meerkat.datapanel.DataPanel]) –
- Returns
Returns a fit instance of the slicer.
- Return type
- get_params()[source]
Get the parameters of this slicer. Returns a dictionary mapping from the names of the parameters (as they are defined in the
__init__
) to their values.- Returns
A dictionary of parameters.
- Return type
Dict[str, Any]
- abstract predict(data, embeddings='embedding', targets='target', pred_probs='pred_probs')[source]
Get slice membership for data using the fit slicer.
Caution
Must call
Slicer.fit
prior to callingSlicer.predict
.- Parameters
data (mk.DataPanel, optional) – A Meerkat DataPanel with columns for embeddings, targets, and prediction probabilities. The names of the columns can be specified with the
embeddings
,targets
, andpred_probs
arguments. Defaults to None.embeddings (Union[str, np.ndarray], optional) – The name of a colum in
data
holding embeddings. Ifdata
isNone
, then an np.ndarray of shape (n_samples, dimension of embedding). Defaults to “embedding”.targets (Union[str, np.ndarray], optional) – The name of a column in
data
holding class labels. Ifdata
isNone
, then an np.ndarray of shape (n_samples,). Defaults to “target”.pred_probs (Union[str, np.ndarray], optional) – The name of a column in
data
holding model predictions (can either be “soft” probability scores or “hard” 1-hot encoded predictions). Ifdata
isNone
, then an np.ndarray of shape (n_samples, n_classes) or (n_samples,) in the binary case. Defaults to “pred_probs”.losses (Union[str, np.ndarray], optional) – The name of a column in
data
holding the loss of the model predictions. Ifdata
isNone
, then an np.ndarray of shape (n_samples,). Defaults to “loss”.
- Returns
- A binary
np.ndarray
of shape (n_samples, n_slices) where values are either 1 or 0.
- A binary
- Return type
np.ndarray
- abstract predict_proba(data, embeddings='embedding', targets='target', pred_probs='pred_probs')[source]
Get probablisitic (i.e. soft) slice membership for data using the fit slicer.
Caution
Must call
Slicer.fit
prior to callingSlicer.predict
.- Parameters
data (mk.DataPanel, optional) – A Meerkat DataPanel with columns for embeddings, targets, and prediction probabilities. The names of the columns can be specified with the
embeddings
,targets
, andpred_probs
arguments. Defaults to None.embeddings (Union[str, np.ndarray], optional) – The name of a colum in
data
holding embeddings. Ifdata
isNone
, then an np.ndarray of shape (n_samples, dimension of embedding). Defaults to “embedding”.targets (Union[str, np.ndarray], optional) – The name of a column in
data
holding class labels. Ifdata
isNone
, then an np.ndarray of shape (n_samples,). Defaults to “target”.pred_probs (Union[str, np.ndarray], optional) – The name of a column in
data
holding model predictions (can either be “soft” probability scores or “hard” 1-hot encoded predictions). Ifdata
isNone
, then an np.ndarray of shape (n_samples, n_classes) or (n_samples,) in the binary case. Defaults to “pred_probs”.losses (Union[str, np.ndarray], optional) – The name of a column in
data
holding the loss of the model predictions. Ifdata
isNone
, then an np.ndarray of shape (n_samples,). Defaults to “loss”.
- Returns
- A binary
np.ndarray
of shape (n_samples, n_slices) where values are either 1 or 0.
- A binary
- Return type
np.ndarray
- set_params(**params)[source]
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
DominoSlicer
- DominoSlicer
alias of
domino._slice.mixture.MixtureSlicer
SpotlightSlicer
- class SpotlightSlicer(n_slices=5, spotlight_size=0.02, n_steps=1000, learning_rate=0.001, device=device(type='cpu'), pbar=False)[source]
Slice a dataset with The Spotlight algorithm [deon_2022].
TODO: add docstring similar to the Domino one
- deon_2022(1,2)
d’Eon, G., d’Eon, J., Wright, J. R. & Leyton-Brown, K. The Spotlight: A General Method for Discovering Systematic Errors in Deep Learning Models. arXiv:2107. 00758 [cs, stat] (2021)
- Parameters
n_slices (int) –
spotlight_size (int) –
n_steps (int) –
learning_rate (float) –
device (torch.device) –
pbar (bool) –
MultiaccuracySlicer
- class MultiaccuracySlicer(n_slices=5, eta=0.1, dev_valid_frac=0.1, partition_size_threshold=10, pbar=False)[source]
Slice discovery based on MultiAccuracy auditing [kim_2019].
Discover slices by learning a simple function (e.g. ridge regression) that correlates with the residual.
Examples
Suppose you’ve trained a model and stored its predictions on a dataset in a Meerkat DataPanel with columns “emb”, “target”, and “pred_probs”. After loading the DataPanel, you can discover underperforming slices of the validation dataset with the following:
from domino import MultiaccuracySlicer dp = ... # Load dataset into a Meerkat DataPanel # split dataset valid_dp = dp.lz[dp["split"] == "valid"] test_dp = dp.lz[dp["split"] == "test"] slicer = MultiaccuracySlicer() slicer.fit( data=valid_dp, embeddings="emb", targets="target", pred_probs="pred_probs" ) dp["slicer"] = slicer.predict( data=test_dp, embeddings="emb", targets="target", pred_probs="pred_probs" )
- Parameters
n_slices (int, optional) – The number of slices to discover. Defaults to 5.
eta (float, optional) – Step size for the logits update, see final line Algorithm 1 in . Defaults to 0.1
dev_valid_frac (float, optional) – The fraction of data held out for computing corr. Defaults to 0.3.
partition_size_threshold (int) –
pbar (bool) –
- kim_2019
- @inproceedings{kim2019multiaccuracy,
title={Multiaccuracy: Black-box post-processing for fairness in classification}, author={Kim, Michael P and Ghorbani, Amirata and Zou, James}, booktitle={Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society}, pages={247–254}, year={2019}
}
BarlowSlicer
- class BarlowSlicer(n_slices=5, max_depth=3, n_features=128, pbar=True)[source]
Slice Discovery based on the Barlow [singla_2021].
Discover slices using a decision tree. TODO(singlasahil14): add any more details describing your method
Examples
Suppose you’ve trained a model and stored its predictions on a dataset in a Meerkat DataPanel with columns “emb”, “target”, and “pred_probs”. After loading the DataPanel, you can discover underperforming slices of the validation dataset with the following:
from domino import BarlowSlicer dp = ... # Load dataset into a Meerkat DataPanel # split dataset valid_dp = dp.lz[dp["split"] == "valid"] test_dp = dp.lz[dp["split"] == "test"] barlow = BarlowSlicer() barlow.fit( data=valid_dp, embeddings="emb", targets="target", pred_probs="pred_probs" ) dp["barlow_slices"] = barlow.transform( data=test_dp, embeddings="emb", targets="target", pred_probs="pred_probs" )
- Parameters
n_slices (int, optional) – The number of slices to discover. Defaults to 5.
max_depth (str, optional) – The maximum depth of the desicion tree. Defaults to 3. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than 2 samples. See SKlearn documentation for more information.
n_features (int, optional) – The number features from the embedding to use. Defaults to 128. Features are selcted using mutual information estimate.
pbar (bool, optional) – Whether to show a progress bar. Ignored for barlow.
- singla_2021
Singla, Sahil, et al. “Understanding failures of deep networks via robust feature extraction.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
- engstrom_2019
- @misc{robustness,
title={Robustness (Python Library)}, author={Logan Engstrom and Andrew Ilyas and Hadi Salman and Shibani Santurkar and Dimitris Tsipras}, year={2019}, url={https://github.com/MadryLab/robustness}
}