Command Palette
Search for a command to run...
Sundararajan Mukund Taly Ankur Yan Qiqi

Abstract
We study the problem of attributing the prediction of a deep network to itsinput features, a problem previously studied by several other works. Weidentify two fundamental axioms---Sensitivity and Implementation Invariancethat attribution methods ought to satisfy. We show that they are not satisfiedby most known attribution methods, which we consider to be a fundamentalweakness of those methods. We use the axioms to guide the design of a newattribution method called Integrated Gradients. Our method requires nomodification to the original network and is extremely simple to implement; itjust needs a few calls to the standard gradient operator. We apply this methodto a couple of image models, a couple of text models and a chemistry model,demonstrating its ability to debug networks, to extract rules from a network,and to enable users to engage with models better.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| image-attribution-on-celeba | Integrated Gradients | Deletion AUC score (ArcFace ResNet-101): 0.0680 Insertion AUC score (ArcFace ResNet-101): 0.3578 |
| image-attribution-on-cub-200-2011-1 | Integrated Gradients | Deletion AUC score (ResNet-101): 0.0728 Insertion AUC score (ResNet-101): 0.0422 |
| image-attribution-on-vggface2 | Integrated Gradients | Deletion AUC score (ArcFace ResNet-101): 0.0749 Insertion AUC score (ArcFace ResNet-101): 0.5399 |
| interpretability-techniques-for-deep-learning-1 | Integrated Gradients | Insertion AUC score: 0.3578 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.