A chapter in the story of us trying to make sense of “explanations” in “explainable AI”
tl;dr: they don’t explain much and what they do explain, they do it strangely
Calculations ahead! Everyone ready? 3, 2, 1…
Alice and Bob own a left-hand glove; Claire owns a right-hand glove. A pair is worth 1 dollar, but either hand by itself is worth nothing.
Shapley Values assign credit to players
$ \varphi_i(v) = \sum_{S \subset N \backslash \{i\} } \frac{|S|! (n - |S| - 1)!}{n} \left ( v(S \cup \{i\} ) - v(S) \right ) $
Only functional that satisfies linearity, symmetry, and “efficiency”
Widely used in ML for feature importance explanations: “how important is gender for this health risk classifier?”
Define a game with one feature per player, define feature importance as the Shapley Value of player.
We think this is problematic on many grounds; today, we give a purely mathematical critique.
Take the game on a hypercube, and define the gradient $\nabla$. Decompose $\nabla$ in $\nabla_1, \dots, \nabla_d$ such that $\sum_i \nabla_i = \nabla$.
$L$ is the Laplacian operator, and its eigenvectors are the Fourier Transform matrix.
Closed-form formula for Shapley Values from polynomial basis coefficients
Shapley value for a player equals $-2 \times k$, where $k$ is the sum of all odd-degree coefficients of monomials containing that player, each divided by the degree of the monomial.
For 2-feature model with polynomial $c_1 1 + c_a a + c_b b + c_{ab} a b$, the Shapley value for player $a$ is $-2c_a$ (seems fine, but where’s $c_{ab}$?)
For 3-feature model: $-2(c_a + c_{abc}/3)$ (hmm…)
For 4-feature model: $-2(c_a + c_{abc}/3 + c_{abd}/3 + c_{acd}/3)$ (uh…)
For 5-feature model: 1 linear term, 6 cubic terms, 1 quintic term (????)
Shapley Values are weird: there’s no good reason that expression is a good summary of the feature importances.
Takeaways:
A game is inessential iff its corresponding polynomial is strictly linear. Polynomials seem to be good explanations for feature importance games!
There’s a deep connection between SVs and FTs via $\nabla$ and its SVD
SVs are very weird objects for ML feature importance explanations. Be skeptical of interpretation methods using them.
Questions? (We’re still working on the writeup…)
Where does the problem arise?
Our opinion: the axiom of efficiency makes sense in the context of cooperative games, but is pretty silly in feature importance explanations. We don’t want signed values to sum up to the total contribution: we want magnitudes.
We think orthogonality is a much better choice, and leads to a much simpler “explanation”: the FT representation itself!
Define one subgame per player
\[\begin{eqnarray*}v_i &=& \textrm{argmin}_{u} || \nabla u - \nabla_i v ||_2 \\ \varphi_i(v) &=& v_i[U] - v_i[\emptyset]\end{eqnarray*}\]$\nabla^\dagger \nabla_i$ is a linear operator, and $\sum_i v_i[U] - v_i[\emptyset] = v[U] - v[\emptyset]$.