Title: A Sea of Words: An In-Depth Analysis of Anchors for Text Data
Abstract:
Anchors [Ribeiro et al. (2018)] is a post-hoc, rule-based interpretability method. For text data, it proposes to explain a decision by highlighting a small set of words (an anchor) such that the model to explain has similar outputs when they are present in a document. In this talk, I will present the first theoretical analysis of Anchors, considering that the search for the best anchor is exhaustive. I will show how one can use this analysis to gain insights on the behavior of Anchors on simple models, including elementary if-then rules and linear classifiers.
Preprint: https://arxiv.org/abs/2205.13789