Google Dopamine：新的强化学习（RL）框架简介

简介（什么是强化学习）

强化学习是机器学习的重要组成部分。强化学习类似于学习人类和动物如何了解环境。在强化学习中，机器通过其执行的动作和结果来学习。

在强化学习中，学习者是一个在环境中采取行动并因其试图解决问题的行为而获得奖励或惩罚的决策代理。在尝试和运行错误运行之后，它应该学习最佳策略，这是使总回报最大化的行动序列。

在过去几年中，强化学习获得了很大的发展动力。在这方面已经进行了大量的研究和开发。谷歌还在该领域做出了贡献，并发布了新的框架，为强化学习的研发提供速度，稳定性和可重复性。

名为“Google Dopamine”的新框架是一种基于tensorflow 框架的新强化学习。

Google Dopamine

Google Dopamine是一个新的基于Tensorflow的框架，旨在为新的和经验丰富的强化学习（RL）研究人员提供灵活性，稳定性和可重复性。灵感来自大脑奖励动机行为的主要组成部分之一，反映了神经科学与强化学习研究之间的强烈历史联系。

Google Dopamine：新的强化学习（RL）框架简介

Dopamine是一个开源框架，具有以下特点

实验简单：让新用户轻松运行基准实验。
开发灵活：让新用户轻松尝试研究思路。
简洁可靠：为少数经过实战考验的算法提供实施。
可重现：促进结果的可重见性。

Google已经为Github存储库(https://github.com/google/dopamine)提供了明确定义的代码，并很好地解释了该框架的工作原理。

Python代码示例

安装必要的包

首先，我们将安装从头开始构建此代理所需的所有必要软件包。

#dopamine for RL

!pip install — upgrade — no-cache-dir dopamine-rl

# dopamine dependencies

!pip install cmake

#Arcade Learning Environment

!pip install atari_py

安装完所需的软件包后，我们将导入Python库

import numpy as np
import os
#DQN for baselines
from dopamine.agents.dqn import dqn_agent
from dopamine.atari import run_experiment
from dopamine.colab import utils as colab_utils
#warnings
from absl import flags

然后我们将初始化BASE_PATH以存储我们正在训练代理的日志和游戏环境

#where to store training logs
BASE_PATH = '/tmp/colab_dope_run' # @param
#which arcade environment?
GAME = 'Pong' # @param

现在用Python从头开始创建一个新代理

#define where to store log data
LOG_PATH = os.path.join(BASE_PATH, 'basic_agent', GAME)
class BasicAgent(object):
 """This agent randomly selects an action and sticks to it. It will change
 actions with probability switch_prob."""
 def __init__(self, sess, num_actions, switch_prob=0.1):
 #tensorflow session
 self._sess = sess
 #how many possible actions can it take?
 self._num_actions = num_actions
 # probability of switching actions in the next timestep?
 self._switch_prob = switch_prob
 #initialize the action to take (randomly)
 self._last_action = np.random.randint(num_actions)
 #not debugging
 self.eval_mode = False
 
 
 #policy here
 def _choose_action(self):
 if np.random.random() &lt;= self._switch_prob:
 self._last_action = np.random.randint(self._num_actions)
 return self._last_action
 
 #when it checkpoints during training
 def bundle_and_checkpoint(self, unused_checkpoint_dir, unused_iteration):
 pass
 
 #loading from checkpoint
 def unbundle(self, unused_checkpoint_dir, unused_checkpoint_version,
 unused_data):
 pass
 
 
 def begin_episode(self, unused_observation):
 return self._choose_action()
 
 
 def end_episode(self, unused_reward):
 pass
 
 
 def step(self, reward, observation):
 return self._choose_action()
 
def create_basic_agent(sess, environment):
 """The Runner class will expect a function of this type to create an agent."""
 return BasicAgent(sess, num_actions=environment.action_space.n,
 switch_prob=0.2)
basic_runner = run_experiment.Runner(LOG_PATH,
 create_basic_agent,
 game_name=GAME,
 num_iterations=200,
 training_steps=10,
 evaluation_steps=10,
 max_steps_per_episode=100)

现在我们将训练我们在上面的Python代码中创建的代理

print('Training basic agent, please be patient, it may take a while...')
basic_runner.run_experiment()
print('Done training!')

加载基线数据和训练日志

!gsutil -q -m cp -R gs://download-dopamine-rl/preprocessed-benchmarks/* /content/
experimental_data = colab_utils.load_baselines('/content')
basic_data = colab_utils.read_experiment(log_path=LOG_PATH, verbose=True)
basic_data['agent'] = 'BasicAgent'
basic_data['run_number'] = 1
experimental_data[GAME] = experimental_data[GAME].merge(basic_data,
 how='outer')

受过训练的最终代理

import seaborn as sns
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(16,8))
sns.tsplot(data=experimental_data[GAME], time='iteration', unit='run_number',
 condition='agent', value='train_episode_returns', ax=ax)
plt.title(GAME)
plt.show()

Google Dopamine：新的强化学习（RL）框架简介

Google Dopamine：新的强化学习（RL）框架简介

简介（什么是强化学习）

Google Dopamine

Python代码示例

pandazjd

相关推荐

Menger:大规模分布式强化学习架构

5种用于Python的强化学习框架

BAIR最新RL算法超越谷歌Dreamer，性能提升2.8倍

边做边思考，谷歌大脑提出并发RL算法，机械臂抓取速度提高一倍

Django基础二之URL路由系统

人类终于创造了惰性人工智能……

集合三大类无模型强化学习算法，BAIR开源RL代码库rlpyt

Go语言截取字符串函数用法

jquery当radio值发生变化时触发行为

从认知学到进化论，详述强化学习两大最新突破

6行代码搞定基本的RL算法，速度围观Reddit高赞帖

深度强化学习中泛化的基准

伯克利与OpenAI整合RL与GAN：让代理学习自动发现目标

NeurIPS 2018网易推出强化编程框架，一文解读如何帮RL落地产业

强化学习的基础缺陷

深度增强学习实践：让Python小程序玩游戏训练神经网络

「强化学习炼金术」李飞飞高徒带你一文读懂RL来龙去脉

seq2seq强化学习中Human Bandit反馈的可靠性和可学习性

Python实现常见的回文字符串算法

快1万倍！伯克利提出用深度RL优化SQL查询

iOS之RunLoop