Interpretability Techniques For Deep Learning
评估指标
Log odds-ratio (pythia-6.9b)
评测结果
各个模型在此基准测试上的表现结果
模型名称 | Log odds-ratio (pythia-6.9b) | Paper Title | Repository |
---|---|---|---|
DAS | 9.95 | CausalGym: Benchmarking causal interpretability methods on linguistic tasks | |
LDA | 0.27 | CausalGym: Benchmarking causal interpretability methods on linguistic tasks | |
Linear probe | 3.42 | CausalGym: Benchmarking causal interpretability methods on linguistic tasks | |
Difference-in-means | 2.91 | CausalGym: Benchmarking causal interpretability methods on linguistic tasks | |
PCA | 1.81 | CausalGym: Benchmarking causal interpretability methods on linguistic tasks | |
k-means | 1.87 | CausalGym: Benchmarking causal interpretability methods on linguistic tasks | |
Random | 0.01 | CausalGym: Benchmarking causal interpretability methods on linguistic tasks |
0 of 7 row(s) selected.