Inverted pendulum using learning
The inverted pendulum problem from OpenAI Gym is solved using a simple network that learns the relevant derivatives using positional data. The inferred derivatives are used in a RL-scheme which keeps the cart stable.
- Train a fully connected, one-layer network on (essentially noisy) positional data from CartPole-v0 to learn first and second derivatives of cart position and pole angle (this is given in OpenAI). The output of this network will be the probability of a derivative and and not the derivative itself.
- Use the output of the network on the fly combined with Q-learning to keep the pole stable.