1

abstract :

How do sequence models represent their decision-making process? Prior work suggests that Othello-playing neural network learned nonlinear models of the board state (Li et al., 2023). In this work, we provide evidence of a closely related linear representation of the board. In particular, we show that probing for "my colour" vs. "opponent's colour" may be a simple yet powerful way to interpret the model's internal state. This precise understanding of the internal representations allows us to control the model's behaviour with simple vector arithmetic. Linear representations enable significant interpretability progress, which we demonstrate with further exploration of how the world model is computed.

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here
this post was submitted on 27 Dec 2023
1 points (66.7% liked)

Machine Learning

478 readers
1 users here now

A community for posting things related to machine learning

Icon base by Lorc under CC BY 3.0 with modifications to add a gradient

founded 1 year ago
MODERATORS