RL Agents Ability To Behave More Humanly Questioned?

0
657

Researchers from Google Brain, alongside Vector Institute and the University of Toronto, displayed the entropy along with information gain and also the empowerment of RL agents correlating strongly with human behaviour similarity metrics.

In recent years reinforcement learning happened to make several achievements in areas such as providing solutions to complex solutions. Their use ranges widely from playing complex mind games to areas of high difficulty such as robotic manipulations, the agent in RL has also been able in the accomplishment of task depending upon the manually defined reward function for the same.

As per researchers’ statements designing informative rewards functions does often lead to the vast range of issues some of them being the expense is huge, highly time-consuming while execution of the task and also highly susceptible to human errors. The increase in difficulty is proportional to the complexity of the task being executed. Being inspired by such learning abilities of natural agents such as children to be able to mitigate such problems, the researchers also happen to study some of the dominant types of common intrinsic motivation.

The Mechanism Behind the Same:

To accelerate the development of the intrinsic objectives, researchers started by computation of potential goals set for the pre-collected dataset of agent behaviour and avoided proceeding ahead with the option of optimising them online and also comparing them by analysing the correlations in the same. They also happen to study three types of intrinsic motivation seeking mathematical objectives for RL that are not dependant on a specific task and applies to any type of unknown environment.

The study summarized them as follows

  • Input Entropy: This model encourages encounters with rare sensory inputs
  • Information Gain: This gained information help reward the agent for further discovering the rules of that particular environment
  • Empowerment: This helps reward the agent for enabling maximised influence over the sensory inputs or environment.

It is also noted worth for taking into consideration that whilst designing intrinsic objectives results may produce intelligent behaviour across the various fields of reinforcement learning. Furthermore, we need to have training agents to evaluate the different intrinsic objectives also leading to a slow and expensive process.

We can safely conclude that we are yet to see RL Agents on behaving more human-like without relying on task rewards at this current time but we can wait to see if the future changes the same.