An attempt to implement the recurrent attention model (RAM) from “Recurrent Models of Visual Attention” (Mnih+ 2014) in collaboration with ppries.
Link to repo and my notes.
- Overarching aim is to implement a slightly more refined reward structure, such that the learner is punished by wrong guess. The biological analogy is that an animal increasing it’s survivability by correctlying inferring the important features of the environment. Not doing so will decrease accumulated reward. E.g. water is important when thirsty, but less so when hungry.
- Adapt sean999 implementation to python 3.x (done, see repo).
- Change reward to episodic structure with focus on only two numbers, e.g. 3 and 7.
- Introduce dynamics into reward such that after e.g. T episodes, 3 stops being rewarding (and becomes punishing) and 7 becomes rewarding.