Abril 23, 2018

DeepMind's AlphaGo Zero removes humans from AI equation

19 Octubre 2017, 07:55 | Bibiana Flor

Well, here's something to chew on: Google's AI research arm DeepMind, the same benevolent creator that spawned AlphaGo, has already rendered that gluteus maximus-spanking version obsolete.

There are other technical elements that define the new AI, which you can dig into courtesy of DeepMind's paper, published in the scientific journal Nature.

When building the first iterations of AlphaGo, the team explored working on a system like AlphaGo Zero, but then the technology didn't work.

Perhaps we should have seen this coming.

Though extremely impressive, AlphaGo Zero won't replace humans anytime soon.

If, in order to function, AlphaGo learned by basing itself on millions of examples of parts played by humans, AlphaGo Zero - The name of new version - does not need any example.

So, this is why it's taken so long for computers to surpass humans at the game.

Responding to the announcement in a separate editorial for Nature, Satinder Singh, the director of the University of Michigan's AI lab, said Zero "massively outperforms the already superhuman AlphaGo" and could be one of the biggest AI advances so far.

Although DeepMind gained prominence by defeating human Go players, the company has also turned its attention to StarCraft II.

All it needed was a basic set of rules for the game. DeepMind's first paper in Nature past year showed that the algorithm learned for a while from how humans played the game, and then started to play itself to refine those skills. In 21 days, it had beaten the previous version that defeated Ke Jei in all three games.

AlphaGo Zero shows great improvements with respect to all its predecessors.

Approaches using purely reinforcement learning have struggled in AI because ability does not always progress consistently, said David Silver, a scientist at DeepMind who has been leading the development of AlphaGo, at the briefing. "Instead, it is able to learn tabula rasa from the strongest player in the world: AlphaGo itself". AlphaGo Zero, along with AlphaGo Master, each only require a single machine with four TPUs.

The fact that human-guided AlphaGo that defeated Sedol couldn't muster a single win against self-taught AlphaGo Zero had researchers arriving at some rather mind-blowing, and perhaps spine-chilling conclusions.

The latest iteration, however, differs from its predecessors: AlphaGo Zero abandons all hand-engineered features, runs only one neural network (versus the two found in earlier models), and relies exclusively on its own knowledge to evaluate positions. By combining tree search with policy and value networks, AlphaGo has finally reached a professional level in Go, providing hope that human-level performance can now be achieved in other seemingly intractable artificial intelligence domains. Furthermore, the AI will be subject to human limits, since its learning is bounded to pre-existent human knowledge. The game has a rich history, and there's a reason it still captures the imagination of people today. Zero performed so well that it won all 100 matches played. All in less than two months. That's because machines will need to figure out solutions to hard problems even when there isn't a large amount of training data to learn from. Go has fixed rules while humans employ general knowledge and add layers of creativity to it.

They provided no information about how the algorithm has fared in solving other problems.

As for Go, the effects of AlphaGo Zero are likely to be seismic. After three hours, the system's strategy was "greedy stone-capturing", indicative of the human novice. Sure, the behaviors that emerged here are novel, and perhaps unprecedented.

AlphaGo Zero could beat the version of AlphaGo that faced Lee Sedol after training for just 36 hours and earned its 100-o score after 72 hours.

