data:image/s3,"s3://crabby-images/6b316/6b3160dbb4256f9654ef256d7c7bcab71627b8e0" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Goals for the lecture:
Introduction & overview of the key methods and developments.
[Good starting point for you to start reading and understanding papers!]
原文链接:
data:image/s3,"s3://crabby-images/c35fe/c35fe8b5582347955a54e755567f335e4a0eb338" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
@
Probabilistic Graphical Models | Elements of Meta-Learning
01 Intro to Meta-Learning
data:image/s3,"s3://crabby-images/3229f/3229fbac2b8f18718d9c15d643b2771171f6cff6" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Motivation and some examples
When is standard machine learning not enough?
Standard ML finally works for well-defined, stationary tasks.
data:image/s3,"s3://crabby-images/cac2f/cac2f04817867ab66b86f44ffa7502be8afc5510" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
But how about the complex dynamic world, heterogeneous data from people and the interactive robotic systems?
data:image/s3,"s3://crabby-images/876e3/876e36f559862caa7856a24d6f70889a6758d60e" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
General formulation and probabilistic view
What is meta-learning?
Standard learning: Given a distribution over examples (single task), learn a function that minimizes the loss:
data:image/s3,"s3://crabby-images/14b6c/14b6c1a09346ade2ee8e482a9615509eca655098" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Learning-to-learn: Given a distribution over tasks, output an adaptation rule that can be used at test time to generalize from a task description
data:image/s3,"s3://crabby-images/e8db0/e8db01e29ed51aef30ba923a4583fcb13b1e26cf" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
A Toy Example: Few-shot Image Classification
data:image/s3,"s3://crabby-images/a21a0/a21a033b8266048148236ad58bba4e6e5efdf89d" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
data:image/s3,"s3://crabby-images/dbd4a/dbd4a6d63339bf26b4e3412d9c65e99ae90ab4a8" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Other (practical) Examples of Few-shot Learning
data:image/s3,"s3://crabby-images/40715/40715a7b44b347230aef67f38b44e402f5d881ce" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
data:image/s3,"s3://crabby-images/0f19f/0f19f9b90930e71d58e1713bc1d0d58a8b11edaa" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
data:image/s3,"s3://crabby-images/18550/18550439404a8096f497a6dd840b4f9cefcdfb11" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
data:image/s3,"s3://crabby-images/ccda9/ccda9e7ea785b9b998f1d7b01f1b7ab2e80f22d9" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Gradient-based and other types of meta-learning
Model-agnostic Meta-learning (MAML) 与模型无关的元学习
- Start with a common model initialization \(\theta\)
- Given a new task \(T_i\) , adapt the model using a gradient step:
data:image/s3,"s3://crabby-images/8ee49/8ee497622bddfab8a451562aae27189f14d8c979" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
- Meta-training is learning a shared initialization for all tasks:
data:image/s3,"s3://crabby-images/6c8ee/6c8ee00cd417d73d3d9b7c38e6de3c9a3ef51ae8" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
data:image/s3,"s3://crabby-images/79bb6/79bb6dacda15a5d31a37972893d808fc257e935c" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Does MAML Work?
data:image/s3,"s3://crabby-images/803b9/803b921a227693ec5d5d11de44959612f8f371c8" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
MAML from a Probabilistic Standpoint
Training points: data:image/s3,"s3://crabby-images/d5250/d52504464bec830f6a126ea4023a194e62a39467" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
testing points:data:image/s3,"s3://crabby-images/e0bfb/e0bfbbfb1ffb802717fcb3e1f0b32122e89f3001" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
MAML with log-likelihood loss对数似然损失:
data:image/s3,"s3://crabby-images/c59c1/c59c12c84ddf890aeacf60fb3631ffc7a7524b58" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
data:image/s3,"s3://crabby-images/c621b/c621bb0792a7fc83fbcc38bf3d1e965d8ff859d4" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
One More Example: One-shot Imitation Learning 模仿学习
data:image/s3,"s3://crabby-images/1a048/1a0488b0808b71461946398cccb0c3c1ace498b0" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Prototype-based Meta-learning
data:image/s3,"s3://crabby-images/6f3e5/6f3e5857775df790991d084eaf5951bd6cf72d7a" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Prototypes:
data:image/s3,"s3://crabby-images/d3438/d34383656bb10a69a775e943fc622b99dfa3499b" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Predictive distribution:
data:image/s3,"s3://crabby-images/7cde7/7cde76592be0ca992c8aa2b1872ad39c87811b36" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Does Prototype-based Meta-learning Work?
data:image/s3,"s3://crabby-images/5ddec/5ddec1f4267a64da256b90df0fcc3c06a22f511b" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Rapid Learning or Feature Reuse 特征重用
data:image/s3,"s3://crabby-images/7f578/7f57886049eded1d5b0c37b9b04ac881abaaca98" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
data:image/s3,"s3://crabby-images/56bc8/56bc898fcf8d7cee2414e38074c493c927379f07" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
data:image/s3,"s3://crabby-images/d85c5/d85c5b3d3186a820932d32485d9023b873ff53e0" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
data:image/s3,"s3://crabby-images/d28e3/d28e3e2a9292965bb8ac3c2cb0a41e981545c967" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Neural processes and relation of meta-learning to GPs
Drawing parallels between meta-learning and GPs
In few-shot learning:
- Learn to identify functions that generated the data from just a few examples.
- The function class and the adaptation rule encapsulate our prior knowledge.
Recall Gaussian Processes (GPs): 高斯过程
- Given a few (x, y) pairs, we can compute the predictive mean and variance.
- Our prior knowledge is encapsulated in the kernel function.
data:image/s3,"s3://crabby-images/88308/883086d3982dc3ec3840fe394ab92b101be0c700" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Conditional Neural Processes 条件神经过程
data:image/s3,"s3://crabby-images/8d8e9/8d8e9233124bbbf2b0c0bc39ba4c6d52c388ac01" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
data:image/s3,"s3://crabby-images/f278b/f278b2b0b9cf73a63363faab9619b4df3287c6fa" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
data:image/s3,"s3://crabby-images/1e8d2/1e8d2a871963300fe38511ffc279157956441214" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
data:image/s3,"s3://crabby-images/8270a/8270aac6da8f07a4ac90b809cc0604cb16ee4f38" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
On software packages for meta-learning
A lot of research code releases (code is fragile and sometimes broken)
A few notable libraries that implement a few specific methods:
data:image/s3,"s3://crabby-images/3cd52/3cd52bb43a23cc1a3f1956aff8d9b45284c2a91b" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Takeaways
- Many real-world scenarios require building adaptive systems and cannot be solved using “learn-once” standard ML approach.
- Learning-to-learn (or meta-learning) attempts extend ML to rich multitask scenarios—instead of learning a function, learn a learning algorithm.
- Two families of widely popular methods:
- Gradient-based meta-learning (MAML and such)
- Prototype-based meta-learning (Protonets, Neural Processes, ...)
- Many hybrids, extensions, improvements (CAIVA, MetaSGD, ...)
- Is it about adaptation or learning good representations? Still unclear and depends on the task; having good representations might be enough.
- Meta-learning can be used as a mechanism for causal discovery.因果发现 (See Bengio et al., 2019.)
02 Elements of Meta-RL
What is meta-RL and why does it make sense?
Recall the definition of learning-to-learn
Standard learning: Given a distribution over examples (single task), learn a function that minimizes the loss:
data:image/s3,"s3://crabby-images/df06e/df06ee09658ff04f5a26bea976c01706fd976773" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Learning-to-learn: Given a distribution over tasks, output an adaptation rule that can be used at test time to generalize from a task description
data:image/s3,"s3://crabby-images/c5ff6/c5ff6e4a328e2103b32b7ce9b2f55edb751b8163" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Meta reinforcement learning (RL): Given a distribution over environments, train a policy update rule that can solve new environments given only limited or no initial experience.
data:image/s3,"s3://crabby-images/84b79/84b79eee453a84ee99ef8729f40aa078287453bf" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Meta-learning for RL
data:image/s3,"s3://crabby-images/0d660/0d66048508c05a5a35295298ab75c0f109612d59" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
On-policy and off-policy meta-RL
On-policy RL: Quick Recap 符合策略的RL:快速回顾
data:image/s3,"s3://crabby-images/94217/9421751df06229519cfcdcc5f121614e33070e2c" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
REINFORCE algorithm:
data:image/s3,"s3://crabby-images/91569/91569c82be8591f34c9b34dc6995c084551cae11" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
On-policy Meta-RL: MAML (again!)
- Start with a common policy initialization \(\theta\)
- Given a new task \(T_i\) , collect data using initial policy, then adapt using a gradient step:
data:image/s3,"s3://crabby-images/8ee49/8ee497622bddfab8a451562aae27189f14d8c979" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
- Meta-training is learning a shared initialization for all tasks:
data:image/s3,"s3://crabby-images/6c8ee/6c8ee00cd417d73d3d9b7c38e6de3c9a3ef51ae8" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
data:image/s3,"s3://crabby-images/79bb6/79bb6dacda15a5d31a37972893d808fc257e935c" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Adaptation as Inference 适应推理
Treat policy parameters, tasks, and all trajectories as random variables随机变量
data:image/s3,"s3://crabby-images/42786/427860753ac88f99038b8e664c198ce90ed71be7" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
meta-learning = learning a prior and adaptation = inference
data:image/s3,"s3://crabby-images/4be34/4be342762b39c35a6b4acf4a8939db34e434ae5b" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Off-policy meta-RL: PEARL
data:image/s3,"s3://crabby-images/5a743/5a743611bbc3ed36f529e191e35e3a954b7f3f06" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
data:image/s3,"s3://crabby-images/b0579/b05799d4efcffaf1bdeafed3a0f3598ab802330e" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Key points:
- Infer latent representations z of each task from the trajectory data.
- The inference networkq is decoupled from the policy, which enables off-policy learning.
- All objectives involve the inference and policy networks.
data:image/s3,"s3://crabby-images/8453a/8453afa3878cbf545143b0656348ffd7d60b81b4" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Adaptation in nonstationary environments 不稳定环境
Classical few-shot learning setup:
- The tasks are i.i.d. samples from some underlying distribution.
- Given a new task, we get to interact with it before adapting.
- What if we are in a nonstationary environment (i.e. changing over time)? Can we still use meta-learning?
data:image/s3,"s3://crabby-images/f0695/f0695836467686405bae0fb1c578da4cd49fbe37" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Example: adaptation to a learning opponent
Each new round is a new task. Nonstationary environment is a sequence of tasks.
Continuous adaptation setup:
- The tasks are sequentially dependent.
- meta-learn to exploit dependencies
data:image/s3,"s3://crabby-images/41154/411542ebe74ae5872684b23e411315b17cabc894" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Continuous adaptation
Treat policy parameters, tasks, and all trajectories as random variables
data:image/s3,"s3://crabby-images/2a617/2a6177be804c08083d61ee77c5844485226a7a05" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
RoboSumo: a multiagent competitive env
an agent competes vs. an opponent, the opponent’s behavior changes over time
data:image/s3,"s3://crabby-images/4c88e/4c88e76185d6f644f6db265dba8cd5298325d882" alt="卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning 卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning"
Takeaways
- Learning-to-learn (or meta-learning) setup is particularly suitable for multi-task reinforcement learning
- Both on-policy and off-policy RL can be “upgraded” to meta-RL:
- On-policy meta-RL is directly enabled by MAML
- Decoupling task inference and policy learning enables off-policy methods
- Is it about fast adaptation or learning good multitask representations? (See discussion in Meta-Q-Learning: https://arxiv.org/abs/1910.00125)
- Probabilistic view of meta-learning allows to use meta-learning ideas beyond distributions of i.i.d. tasks, e.g., continuous adaptation.
- Very active area of research.