The trained computer
We want to train a model that does numeric computation such as 1 + 2 = 3, what do we need for computation:
input
- reusable
computer
unit -> repeat transformer block memory
-> concatenated embeddings accessed via cross attentionalgorithm
, that gives the desiredoutput
in our case computer
& algorithm
will be merged in the model
memory
will be intermediates latent states concatenated
the algorithm
we decided to learn are (from easy to difficult) :
- copy input to output, with some variations
- addition
- multiplication
- number factorisation
Task 3 in particular will help to test how this method performs in variable complexity. Since the last task highly relies on memory to reduce computation we will observe how the model model will use the given memory.
Checkout :
- toy_model.py
- on going experiment log
Based on the observed result we could re-use the same approach on Language Modeling Task following the original ideas.
About the model
The model is a cross-attention latent-based transformer (like Perceiver):
- layer weight sharing to allow reuseable compute block
- hidden latent vector as information passing
- cross attention on input
- cross attention on past latent (wider information passing)
here's a draft of the initial idea
Similar ideas: