Natural Intelligence?

In this edition of #scienceandchill, I will try to introduce a framework for intelligence (both artificial and biological). This framework, coexisting in both computer science and behavioral psychology, is known as reinforcement learning. What does it entail?

Let’s think how we would train a dolphin to collect litter from a pool and bring it back us, as done by the Marine Mammal Studies in Mississippi [1]. We’ll call this dolphin an ‘agent’, and everything around it the ‘environment’. The environment includes the pool, the litter, us (the trainers), and basically everything that isn’t the dolphin.

dolphinatdock
Who needs artificial intelligence when it already exists in nature? Source: imms.org

What would be a good way to train these dolphins? Well, what do dolphins like? Fish! That right, we can reward these dolphins with fish whenever they bring back litter. Fish, in our case, is what would be called a ‘positive reward signal’ in the field of reinforcement learning. Note that the reward signal can be negative as well. But in our case, simply not giving the dolphins any fish if they don’t bring back litter would suffice because the dolphins wouldn’t have anything to eat. Seems simple enough, right?

At a high-level, this is what the reinforcement learning framework entails. There is an agent in an environment, and the agent does actions (like collecting litter from the pool) which manipulates the state of the environment. Furthermore, the agent receives rewards (could be varying magnitudes and positive or negative) and observes the change in state of the environment. Observing the reward and new state allows the agent to reinforce the consequences of particular actions, which eventually leads to learning an optimal behavior, or policy. This feedback loop is shown in the figure below (taken from [2]).

RL1.jpg

There is more to how the reinforcement is actually done, a concept called TD-learning [3] , but we will not cover that in this post. The eventual policy, the mapping from action to state, that the agent learns depends a lot on how we design the reward logic! Dolphins figured out a way to game this system:

“One day, when a gull flew into her pool, she grabbed it, waited for the trainers and then gave it to them. It was a large bird and so the trainers gave her lots of fish. This seemed to give Kelly a new idea. The next time she was fed, instead of eating the last fish, she took it to the bottom of the pool and hid it under the rock where she had been hiding the paper. When no trainers were present, she brought the fish to the surface and used it to lure the gulls, which she would catch to get even more fish. After mastering this lucrative strategy, she taught her calf, who taught other calves, and so gull-baiting has become a hot game among the dolphins.” [1]

— Sims

[1] https://www.theguardian.com/science/2003/jul/03/research.science

[2]http://blogs.cornell.edu/ml4ics/2011/05/09/approach-to-the-problem-irl/

[3]https://en.wikipedia.org/wiki/Temporal_difference_learning

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s