The Role of Priors in Re-inforcement Learning

To get the context of the point that I’m driving through this story, we need to look at it from the various dimensions it provides. As a result, it’s going to be more about understanding Intelligence (Part I) & Learning (Part II) to get a clear picture of the future we are headed towards.

PART I: Setting the stage for the Future (Intelligence)

As we advance into the second half of 2018, it makes me feel a great sense of humility to be alive at such an exciting time of technological revolution. To be specific the major advancement as seen in Artificial Intelligence. If I had to talk about the 3 major accomplishments from this field in a very recent time frame (Last 2 years) it would be as follows:

1. A robot named Xiaoyi passed the national medical licensing exam

An ambitious collaboration that took place between the chinese technology company iFlytek & Tsinghua university in 2017. The teams used deep learning algorithms to process the data from medical textbooks, clinical guidelines and medical cases. Their eventual goal is to create an AI assistant to the physicians to improve their efficiency.

2. Google Duplex passes the Turing Test at I/O conference

This one is the most recent milestone that is going to pave way for a lot of possibilities in the future. This was the moment when the google AI smartly mimicked the human conversational mannerism to have a conversation with a human. The rest has already made it into history.

3. AI beats a radiologist in detecting Pneumonia

One of the most effective applications of AI is going to be for the healthcare industry. While there is surely a lot of advancement that we are expected to see in this space in the next 5 years, it was interesting to come across this paper published by a group of researchers from Stanford University’s machine learning group in 2017. Renowned Stanford professor Andrew Ng was part of this experiment.

I’m sure the next 7 years to be exact, there is going to be more milestone events in this list. Also, the innovations that we would be seeing would be stemming from the further advancements in the subject of re-inforcement learning. Something which more precisely would be benefitting the domain of robotics hugely.

PART II: Somewhere between Now & Tomorrow (Learning)

We’ve already known that RL (we’ll use this term here on to represent re-inforcement learning) can be effective in teaching robots new behaviour. To get a better understanding of how that could be achieved this interesting research paper talked something about human priors, something which helps humans learn new skills, quite effortlessly.

In the research paper, through the use of a video game environment tested on human players it is observed how a player is able to solve the difficulty and reach the goal without any supervision. Once the player solves the first level, they’re given a different as well as a slightly difficult version of the same game and so on until they received the hard mode which was extremely challenging to solve.

Some of the highlights of this experiment was that humans have a prior information that they enter with in a given environment and that information mostly stems from a visual understanding related to objects. We almost certainly are able to identify the scenario and make our way. This can be found true in the case of infants, who try to make meaning of the physical world around them without having access to any information prior.

In that case, what if the robot’s cognitive structure is pre-programmed (setting their own human-equivalent priors) would that help them learn new skills or behaviours easily?

It may sound like an interesting theory (I’ll keep it open for discussion & please do share your thoughts if I’m missing out something), but if that were the case we would have surely noticed its application.

Which brings me to the topic of how does a human truly learn something new? I think somewhere, I got close to the answer from one of my childhood cases of how I learned to ride a bicycle. One of the biggest challenges when we start off as children to ride a two-wheeled bicycle is the ability to balance it. I remember falling off multiple times, but the excitement to learn to ride it made me get up again and keep riding. Then after some time, I wobbled a bit but as soon as I figured it (unsupervised) on my own, the joy knew on bounds. It was an alignment of several aspects combined together that lead to me balancing the cycle and riding it effortlessly. How could it possibly happen?

I was ‘almost aware’ that I had to keep my handle straight, pedal at a certain pace, keep my body in a certain position, apply the brakes to reduce the speed where I would feel I’m overspeeding or about to fall perhaps, turn the handle when I had to change direction. It just happened as if I’m almost aware what to do at a certain given point.

“At the center of your being
you have the answer;
you know who you are
and you know what you want.” — Lao Tzu

It’s a realization that the only way we can learn something is when we are almost aware of it already. So the key to deciphering how learning happens as far as humans are concerned is to understand the gap between being aware & no awareness. So in that context our innate desire to invent the future would require us to address the gap between now & tomorrow.

RL systems learn to evolve in the tabula rasa mode, which is nothing but a blank canvas. It’s important to take note of this fact as humans never learn a new skill with such a blank canvas, they enter the environment with a prior understanding or even with certain assumptions to make the process of learning an eventual success. Also, the fact that we focus more on performance rather than learning, itself keeps us from the real advancement in developing RL systems.

I believe RL is going to be a significant part of the future we are headed towards and as a result it is more important to source our understanding of how a human truly learns to replicate the pre-programmed priors in RL systems. There is a need to drive the RL systems towards the phase of ‘almost aware’ about their own environment to ultimately bridge the gap that leads to awareness. Thus completing the basic phase of learning.

Leave a Reply

Your email address will not be published. Required fields are marked *