Semanlink - [2301.04709] Causal Abstraction for Faithful Model Interpretation

Tags:

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Atticus Geiger
sl:arxiv_num : 2301.04709
sl:arxiv_published : 2023-01-11T20:42:41Z
sl:arxiv_summary : A faithful and interpretable explanation of an AI model's behavior and internal structure is a high-level explanation that is human-intelligible but also consistent with the known, but often opaque low-level causal details of the model. We argue that the theory of causal abstraction provides the mathematical foundations for the desired kinds of model explanations. In causal abstraction analysis, we use interventions on model-internal states to rigorously assess whether an interpretable high-level causal model is a faithful description of an AI model. Our contributions in this area are: (1) We generalize causal abstraction to cyclic causal structures and typed high-level variables. (2) We show how multi-source interchange interventions can be used to conduct causal abstraction analyses. (3) We define a notion of approximate causal abstraction that allows us to assess the degree to which a high-level causal model is a causal abstraction of a lower-level one. (4) We prove constructive causal abstraction can be decomposed into three operations we refer to as marginalization, variable-merge, and value-merge. (5) We formalize the XAI methods of LIME, causal effect estimation, causal mediation analysis, iterated nullspace projection, and circuit-based explanations as special cases of causal abstraction analysis.@en
sl:arxiv_title : Causal Abstraction for Faithful Model Interpretation@en
sl:arxiv_updated : 2023-01-11T20:42:41Z
sl:bookmarkOf : https://arxiv.org/abs/2301.04709
sl:creationDate : 2023-01-14
sl:creationTime : 2023-01-14T23:21:46Z

File info

Bookmark of: https://arxiv.org/abs/2301.04709

Documents with similar tags (experimental)

[2010.00711] A Survey of the State of Explainable AI for Natural Language Processing

Tags:

2022-09-08 About