Slice

DominoSlicer

DominoSlicer(n_slices=5, covariance_type='diag', n_pca_components=128, n_mixture_components=25, y_log_likelihood_weight=1, y_hat_log_likelihood_weight=1, max_iter=100, init_params='confusion', confusion_noise=0.001, random_state=None)[source]

Slice Discovery based on the Domino Mixture Model.

Discover slices by jointly modeling a mixture of input embeddings (e.g. activations from a trained model), class labels, and model predictions. This encourages slices that are homogeneous with respect to error type (e.g. all false positives).

Examples

Suppose you’ve trained a model and stored its predictions on a dataset in a Meerkat DataPanel with columns “emb”, “target”, and “pred_probs”. After loading the DataPanel, you can discover underperforming slices of the validation dataset with the following:

from domino import DominoSlicer
dp = ...  # Load dataset into a Meerkat DataPanel

# split dataset
valid_dp = dp.lz[dp["split"] == "valid"]
test_dp = dp.lz[dp["split"] == "test"]

domino = DominoSlicer()
domino.fit(
    data=valid_dp, embeddings="emb", targets="target", pred_probs="pred_probs"
)
dp["domino_slices"] = domino.transform(
    data=test_dp, embeddings="emb", targets="target", pred_probs="pred_probs"
)

Parameters

n_slices (int, optional) – The number of slices to discover. Defaults to 5.
covariance_type (str, optional) – The type of covariance parameter $\mathbf{\Sigma}$ to use. Same as in sklearn.mixture.GaussianMixture. Defaults to “diag”, which is recommended.
n_pca_components (Union[int, None], optional) – The number of PCA components to use. If None, then no PCA is performed. Defaults to 128.
n_mixture_components (int, optional) – The number of clusters in the mixture model, $\bar{k}$. This differs from n_slices in that the DominoSDM only returns the top n_slices with the highest error rate of the n_mixture_components. Defaults to 25.
y_log_likelihood_weight (float, optional) – The weight $\gamma$ applied to the $P(Y=y_{i} | S=s)$ term in the log likelihood during the E-step. Defaults to 1.
y_hat_log_likelihood_weight (float, optional) – The weight $\hat{\gamma}$ applied to the $P(\hat{Y} = h_\theta(x_i) | S=s)$ term in the log likelihood during the E-step. Defaults to 1.
max_iter (int, optional) – The maximum number of iterations to run. Defaults to 100.
init_params (str, optional) – The initialization method to use. Options are the same as in sklearn.mixture.GaussianMixture plus one addition, “confusion”. If “confusion”, the clusters are initialized such that almost all of the examples in a cluster come from same cell in the confusion matrix. See Notes below for more details. Defaults to “confusion”.
confusion_noise (float, optional) – Only used if init_params="confusion". The scale of noise added to the confusion matrix initialization. See notes below for more details. Defaults to 0.001.
random_state (Union[int, None], optional) – The random seed to use when initializing the parameters.

Notes

The mixture model is an extension of a standard Gaussian Mixture Model. The model is based on the assumption that data is generated according to the following generative process.

Each example belongs to one of $\bar{k}$ slices. This slice $S$ is sampled from a categorical distribution $S \sim Cat(\mathbf{p}_S)$ with parameter $\mathbf{p}_S \in\{\mathbf{p} \in \mathbb{R}_+^{\bar{k}} : \sum_{i = 1}^{\bar{k}} p_i = 1\}$ (see DominoSDM.mm.weights_).
Given the slice $S'$, the embeddings are normally distributed $Z | S \sim \mathcal{N}(\mathbf{\mu}, \mathbf{\Sigma}$) with parameters mean $\mathbf{\mu} \in \mathbb{R}^d$ (see DominoSDM.mm.means_) and $\mathbf{\Sigma} \in \mathbb{S}^{d}_{++}$ (see DominoSDM.mm.covariances_; normally this parameter is constrained to the set of symmetric positive definite $d \\times d$ matrices, however the argument covariance_type allows for other constraints).
Given the slice, the labels vary as a categorical $Y |S \sim Cat(\mathbf{p})$ with parameter $\mathbf{p} \in \{\mathbf{p} \in \mathbb{R}^c_+ : \sum_{i = 1}^c p_i = 1\}$ (see DominoSDM.mm.y_probs).
Given the slice, the model predictions also vary as a categorical $\hat{Y} | S \sim Cat(\mathbf{\hat{p}})$ with parameter $\mathbf{\hat{p}} \in \{\mathbf{\hat{p}} \in \mathbb{R}^c_+ : \sum_{i = 1}^c \hat{p}_i = 1\}$ (see DominoSDM.mm.y_hat_probs).

The mixture model is, thus, parameterized by $\phi = [\mathbf{p}_S, \mu, \Sigma, \mathbf{p}, \mathbf{\hat{p}}]$ corresponding to the attributes weights_, means_, covariances_, y_probs, y_hat_probs respectively. The log-likelihood over the $n$ examples in the validation dataset $D_v$ is given as followsand maximized using expectation-maximization:

\[\ell(\phi) = \sum_{i=1}^n \log \sum_{s=1}^{\hat{k}} P(S=s)P(Z=z_i| S=s) P( Y=y_i| S=s)P(\hat{Y} = h_\theta(x_i) | S=s)\]

We include two optional hyperparameters $\gamma, \hat{\gamma} \in \mathbb{R}_+$ (see y_log_liklihood_weight and y_hat_log_likelihood_weight below) that balance the importance of modeling the class labels and predictions against the importance of modeling the embedding. The modified log-likelihood over $n$ examples is given as follows:

\[\ell(\phi) = \sum_{i=1}^n \log \sum_{s=1}^{\hat{k}} P(S=s)P(Z=z_i| S=s) P( Y=y_i| S=s)^\gamma P(\hat{Y} = h_\theta(x_i) | S=s)^{\hat{\gamma}}\]

Attention

Although we model the prediction $\hat{Y}$ as a categorical random variable, in practice predictions are sometimes “soft” (e.g. the output of a softmax layer is a probability distribution over labels, not a single label). In these cases, the prediction $\hat{Y}$ is technically a dirichlet random variable (i.e. a distribution over distributions).

However, to keep the implementation simple while still leveraging the extra information provided by “soft” predictions, we naïvely plug the “soft” predictions directly into the categorical PMF in the E-step and the update in the M-step. Specifically, during the E-step, instead of computing the categorical PMF $P(\hat{Y}=\hat{y_i} | S=s)$ we compute $\sum_{j=1}^c \hat{y_i}(j) P(\hat{Y}=j | S=s)$ where $\hat{y_i}(j)$ is the “soft” prediction for class $j$ (we can think of this like we’re marginalizing out the uncertainty in the prediction). During the M-step, we compute a “soft” update for the categorical parameters $p_j^{(s)} = \sum_{i=1}^n Q(s,i) \hat{y_i}(j)$ where $Q(s,i)$ is the “responsibility” of slice $s$ towards the data point $i$.

When using "confusion" initialization, each slice $s^{(j)}$ is assigned a $y^{(j)}\in \mathcal{Y}$ and $\hat{y}^{(j)} \in \mathcal{Y}$ (i.e. each slice is assigned a cell in the confusion matrix). This is typically done in a round-robin fashion so that there are at least $\floor{\hat{k} / {|\mathcal{Y}|^2}}$ slices assigned to each cell in the confusion matrix. Then, we fill in the initial responsibility matrix $Q \in \mathbb{R}^{n \times \hat{k}}$, where each cell $Q_{ij}$ corresponds to our model’s initial estimate of $P(S=s^{(j)}|Y=y_i, \hat{Y}=\hat{y}_i)$. We do this according to

\[\begin{split}\bar{Q}_{ij} \leftarrow \begin{cases} 1 + \epsilon & y_i=y^{(j)} \land \hat{y}_i = \hat{y}^{(j)} \\ \epsilon & \text{otherwise} \end{cases}\end{split}\]

\[Q_{ij} \leftarrow \frac{\bar{Q}_{ij} } {\sum_{l=1}^{\hat{k}} \bar{Q}_{il}}\]

where $\epsilon$ is random noise which ensures that slices assigned to the same confusion matrix cell won’t have the exact same initialization. We sample $\epsilon$ uniformly from the range (0, confusion_noise].

SpotlightSlicer

SpotlightSlicer(spotlight_size=0.02, num_steps=1000, learning_rate=0.001, **kwargs)[source]

Slice a dataset with The Spotlight algorithm [deon_2022].

TODO: add docstring similar to the Domino one

deon_2022: d’Eon, G., d’Eon, J., Wright, J. R. & Leyton-Brown, K. The Spotlight: A General Method for Discovering Systematic Errors in Deep Learning Models. arXiv:2107. 00758 [cs, stat] (2021)

Parameters

spotlight_size (int) –
num_steps (int) –
learning_rate (float) –

Embed

Describe