About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Atticus Geiger
- sl:arxiv_num : 2301.04709
- sl:arxiv_published : 2023-01-11T20:42:41Z
- sl:arxiv_summary : A faithful and interpretable explanation of an AI model's behavior and
internal structure is a high-level explanation that is human-intelligible but
also consistent with the known, but often opaque low-level causal details of
the model. We argue that the theory of causal abstraction provides the
mathematical foundations for the desired kinds of model explanations. In causal
abstraction analysis, we use interventions on model-internal states to
rigorously assess whether an interpretable high-level causal model is a
faithful description of an AI model. Our contributions in this area are: (1) We
generalize causal abstraction to cyclic causal structures and typed high-level
variables. (2) We show how multi-source interchange interventions can be used
to conduct causal abstraction analyses. (3) We define a notion of approximate
causal abstraction that allows us to assess the degree to which a high-level
causal model is a causal abstraction of a lower-level one. (4) We prove
constructive causal abstraction can be decomposed into three operations we
refer to as marginalization, variable-merge, and value-merge. (5) We formalize
the XAI methods of LIME, causal effect estimation, causal mediation analysis,
iterated nullspace projection, and circuit-based explanations as special cases
of causal abstraction analysis.@en
- sl:arxiv_title : Causal Abstraction for Faithful Model Interpretation@en
- sl:arxiv_updated : 2023-01-11T20:42:41Z
- sl:bookmarkOf : https://arxiv.org/abs/2301.04709
- sl:creationDate : 2023-01-14
- sl:creationTime : 2023-01-14T23:21:46Z
Documents with similar tags (experimental)