Tracr: Compiled Transformers as a Laboratory for Interpretability

Tracr: Compiled Transformers as a Laboratory for Interpretability.
Interpretability research aims to build tools for understanding machine
learning (ML) models. However, such tools are inherently hard to evaluate
because we do not have ground truth information about how ML models actually
work. In this work, we propose to build transformer models manually as a
testbed for interpretability research. We introduce Tracr, a “compiler” for
translating human-readable programs into weights of a transformer model. Tracr
takes code written in RASP, a domain-specific language (Weiss et al. 2021), and
translates it into weights for a standard, decoder-only, GPT-like transformer
architecture. We use Tracr to create a range of ground truth transformers that
implement programs including computing token frequencies, sorting, and Dyck-n
parenthesis checking, among others. To enable the broader research community to
explore and use compiled models, we provide an open-source implementation of
Tracr at GitHub - google-deepmind/tracr.

Read in full here:

This thread was posted by one of our members via one of our news source trackers.

Corresponding tweet for this thread:

Share link for this tweet.