卡耐基梅隆大学（CMU）元学习和元强化学习课程 | Elements of Meta-Learning
Goals for the lecture:

Introduction & overview of the key methods and developments.
[Good starting point for you to start reading and understanding papers!]

原文链接：

卡耐基梅隆大学（CMU）元学习和元强化学习课程 | Elements of Meta-Learning

Probabilistic Graphical Models | Elements of Meta-Learning

01 Intro to Meta-Learning

卡耐基梅隆大学（CMU）元学习和元强化学习课程 | Elements of Meta-Learning

Motivation and some examples

When is standard machine learning not enough?
Standard ML finally works for well-defined, stationary tasks.
卡耐基梅隆大学（CMU）元学习和元强化学习课程 | Elements of Meta-Learning
But how about the complex dynamic world, heterogeneous data from people and the interactive robotic systems?

General formulation and probabilistic view

What is meta-learning?
Standard learning: Given a distribution over examples (single task), learn a function that minimizes the loss:
卡耐基梅隆大学（CMU）元学习和元强化学习课程 | Elements of Meta-Learning
Learning-to-learn: Given a distribution over tasks, output an adaptation rule that can be used at test time to generalize from a task description

A Toy Example: Few-shot Image Classification
卡耐基梅隆大学（CMU）元学习和元强化学习课程 | Elements of Meta-Learning

Other (practical) Examples of Few-shot Learning
卡耐基梅隆大学（CMU）元学习和元强化学习课程 | Elements of Meta-Learning

Gradient-based and other types of meta-learning

Model-agnostic Meta-learning (MAML) 与模型无关的元学习

Start with a common model initialization \(\theta\)
Given a new task \(T_i\) , adapt the model using a gradient step:
Meta-training is learning a shared initialization for all tasks:

Does MAML Work?
卡耐基梅隆大学（CMU）元学习和元强化学习课程 | Elements of Meta-Learning

MAML from a Probabilistic Standpoint
Training points: 卡耐基梅隆大学（CMU）元学习和元强化学习课程 | Elements of Meta-Learning
testing points:
MAML with log-likelihood loss对数似然损失:

One More Example: One-shot Imitation Learning 模仿学习
卡耐基梅隆大学（CMU）元学习和元强化学习课程 | Elements of Meta-Learning

Prototype-based Meta-learning
卡耐基梅隆大学（CMU）元学习和元强化学习课程 | Elements of Meta-Learning
Prototypes:

Predictive distribution:

Does Prototype-based Meta-learning Work?

Rapid Learning or Feature Reuse 特征重用
卡耐基梅隆大学（CMU）元学习和元强化学习课程 | Elements of Meta-Learning

Neural processes and relation of meta-learning to GPs

Drawing parallels between meta-learning and GPs
In few-shot learning:

Learn to identify functions that generated the data from just a few examples.
The function class and the adaptation rule encapsulate our prior knowledge.

Recall Gaussian Processes (GPs): 高斯过程

Given a few (x, y) pairs, we can compute the predictive mean and variance.
Our prior knowledge is encapsulated in the kernel function.

卡耐基梅隆大学（CMU）元学习和元强化学习课程 | Elements of Meta-Learning

Conditional Neural Processes 条件神经过程
卡耐基梅隆大学（CMU）元学习和元强化学习课程 | Elements of Meta-Learning

On software packages for meta-learning
A lot of research code releases (code is fragile and sometimes broken)
A few notable libraries that implement a few specific methods:

Torchmeta (https://github.com/tristandeleu/pytorch-meta)
Learn2learn (https://github.com/learnables/learn2learn)
Higher (https://github.com/facebookresearch/higher)

卡耐基梅隆大学（CMU）元学习和元强化学习课程 | Elements of Meta-Learning
Takeaways

Many real-world scenarios require building adaptive systems and cannot be solved using “learn-once” standard ML approach.
Learning-to-learn (or meta-learning) attempts extend ML to rich multitask scenarios—instead of learning a function, learn a learning algorithm.
Two families of widely popular methods:
- Gradient-based meta-learning (MAML and such)
- Prototype-based meta-learning (Protonets, Neural Processes, ...)
- Many hybrids, extensions, improvements (CAIVA, MetaSGD, ...)
Is it about adaptation or learning good representations? Still unclear and depends on the task; having good representations might be enough.
Meta-learning can be used as a mechanism for causal discovery.因果发现 (See Bengio et al., 2019.)

02 Elements of Meta-RL

What is meta-RL and why does it make sense?

Recall the definition of learning-to-learn
Standard learning: Given a distribution over examples (single task), learn a function that minimizes the loss：
卡耐基梅隆大学（CMU）元学习和元强化学习课程 | Elements of Meta-Learning
Learning-to-learn: Given a distribution over tasks, output an adaptation rule that can be used at test time to generalize from a task description

Meta reinforcement learning (RL): Given a distribution over environments, train a policy update rule that can solve new environments given only limited or no initial experience.
卡耐基梅隆大学（CMU）元学习和元强化学习课程 | Elements of Meta-Learning

Meta-learning for RL
卡耐基梅隆大学（CMU）元学习和元强化学习课程 | Elements of Meta-Learning

On-policy and off-policy meta-RL

On-policy RL: Quick Recap 符合策略的RL：快速回顾
卡耐基梅隆大学（CMU）元学习和元强化学习课程 | Elements of Meta-Learning
REINFORCE algorithm:

On-policy Meta-RL: MAML (again!)

Start with a common policy initialization \(\theta\)
Given a new task \(T_i\) , collect data using initial policy, then adapt using a gradient step:
Meta-training is learning a shared initialization for all tasks:

Adaptation as Inference 适应推理
Treat policy parameters, tasks, and all trajectories as random variables随机变量

meta-learning = learning a prior and adaptation = inference

Off-policy meta-RL: PEARL

Key points:

Infer latent representations z of each task from the trajectory data.
The inference networkq is decoupled from the policy, which enables off-policy learning.
All objectives involve the inference and policy networks.

Adaptation in nonstationary environments 不稳定环境
Classical few-shot learning setup:

The tasks are i.i.d. samples from some underlying distribution.
Given a new task, we get to interact with it before adapting.
What if we are in a nonstationary environment (i.e. changing over time)? Can we still use meta-learning?

Example: adaptation to a learning opponent
Each new round is a new task. Nonstationary environment is a sequence of tasks.

Continuous adaptation setup:

The tasks are sequentially dependent.
meta-learn to exploit dependencies

Continuous adaptation

Treat policy parameters, tasks, and all trajectories as random variables
卡耐基梅隆大学（CMU）元学习和元强化学习课程 | Elements of Meta-Learning

RoboSumo: a multiagent competitive env
an agent competes vs. an opponent, the opponent’s behavior changes over time
卡耐基梅隆大学（CMU）元学习和元强化学习课程 | Elements of Meta-Learning

Takeaways

Learning-to-learn (or meta-learning) setup is particularly suitable for multi-task reinforcement learning
Both on-policy and off-policy RL can be “upgraded” to meta-RL:
- On-policy meta-RL is directly enabled by MAML
- Decoupling task inference and policy learning enables off-policy methods
Is it about fast adaptation or learning good multitask representations? (See discussion in Meta-Q-Learning: https://arxiv.org/abs/1910.00125)
Probabilistic view of meta-learning allows to use meta-learning ideas beyond distributions of i.i.d. tasks, e.g., continuous adaptation.
Very active area of research.

卡耐基梅隆大学（CMU）元学习和元强化学习课程 | Elements of Meta-Learning

Probabilistic Graphical Models | Elements of Meta-Learning

01 Intro to Meta-Learning

Motivation and some examples

General formulation and probabilistic view

Gradient-based and other types of meta-learning

Neural processes and relation of meta-learning to GPs

02 Elements of Meta-RL

What is meta-RL and why does it make sense?

On-policy and off-policy meta-RL

Continuous adaptation

83153251

相关推荐

机器人懂点「常识」，找东西快多了：CMU打造新型语义导航机器人

CMU本科生开源文言文编程语言，数天2K星

阁下可知文言编程之精妙？CMU本科生开源文言文编程，数天2K星

Facebook与CMU联手打造开源框架PyRobot，LeCun站台的机器人社区

CMU重大突破：无需植入芯片，大脑意念即可控制机械臂

NLP新标杆！谷歌大脑CMU联手推出XLNet，20项任务全面超越BERT

千万美金助力AI发展，CMU与乂学教育松鼠AI联合实验室签约启动

AI开学第一课！CMU深度学习秋季课程开课了（附PPT 、视频）

AI开学第一课！CMU深度学习秋季课程开课了（附PPT 、视频）

CMU研发数据库调优AI，水平超DBA老炮

学习时间！2019斯坦福CS224n、CMU NLP公开课视频开放啦

聚焦强化学习，AAAI 2019杰出论文公布：CMU、斯坦福等上榜

烧脑！CMU、北大等合著论文真的找到了神经网络的全局最小值

指数级加速架构搜索：CMU提出基于梯度下降的可微架构搜索方法

华尔街金融巨头的AI投资：CMU机器学习系主任加入摩根大通

CMU统计机器学习2017 春季课程：研究生水平

CMU研究者探索新卷积方法：在实验中可媲美基准CNN（附实验代码）

CMU联合Petuum提出contrast-GAN：实现生成式语义处理

CMU博士亲授，谷歌AI实习申请一文通关

CMU和谷歌联手研制左右互搏的对抗性机器人

CMU试图统一深度生成模型：搭建GAN和VAE之间的桥梁

CMU与谷歌新研究到MetaMind：如何让机器学习跳读

机器人快跑！伯克利和CMU联合开发两足机器人，两条细腿，一马平川

2016机器学习与自然语言处理学术全景图：CMU排名第一

US News全美人工智能研究生院排名：CMU第一，MIT第二

2017 NIPS 哪家强？我们统计了大会发文数量，谷歌和CMU稳居老大