Deepmind strategizes to create faster reinforcement learning models

0
1177

Researchers at DeepMind and McGill University have recently proposed new approaches to speed up the solution of complex reinforcement learning challenges. In particular, they have introduced a divide and conquer approach to reinforcement learning (RL), which is combined with deep learning to increase the potential of agents.

For the last few years, reinforcement learning has provided a conceptual framework to address several key issues. This algorithm has been used in many applications, such as modeling robots, simulating artificial limbs, developing self-driving cars, playing poker games, Go, and more.

The recent combination of enhanced learning and deep learning has also added several impressive achievements and has proven to be a promising approach to addressing major sequential decision-making problems that are currently intractable. One such issue is the amount of data that RL agents need to learn to perform a task.

In this project, the researchers discussed the possibility of significantly extending the range of problems that RL agents can address, if they have the appropriate mechanisms to leverage prior knowledge. The framework is essentially based on the assumption that the RL problem can usually be subdivided into a multitude of ‘tasks.’

Researchers have generalized two key operations in RL, policy improvement and policy evaluation, from single to multiple, i.e. tasks and policies, respectively. According to them, the generalization of these two key operations is underpinning a large part of the RL, which is policy evaluation and policy improvement. It enables the solution of one task to speed up the solution of other tasks.

The Generalized Policy Evaluation is the calculation of the value function of a policy for a set of tasks. Generalized policy updates make it possible to reuse the task solution in two separate ways. When the reward function of a task can be approximated as a linear combination of reward functions of other tasks, the reinforcement learning problem can be reduced to a simpler linear regression that can be resolved with only a fraction of the data.

If the linearity constraint is not satisfied, the agent can also use the task solution. In this case, by using them to communicate and learn about the environment. This can also significantly reduce the amount of data needed to solve the problem.

Researchers have combined these two strategies to produce a divisive-and-conquer approach to RL that can help to scale agents to problems that are currently intractable due to issues such as lack of data. They stated that if the reward function of a task can be well approximated as a linear combination of reward functions of tasks previously resolved, the reinforcement-learning problem can be reduced to a simpler linear regression.

Researchers added that, if this is not the case, the agent can still use task solutions to interact with and learn about the environment. Both strategies significantly reduce the amount of data needed to solve the reinforcement-learning problem.