# Elements of Meta-Learning 关于元学习和强化学习

## 卡耐基梅隆大学 Probabilistic Graphical Models 课程

Posted by JoselynZhao on May 14, 2020

Goals for the lecture:

Introduction & overview of the key methods and developments. [Good starting point for you to start reading and understanding papers!]

# Probabilistic Graphical Models | Elements of Meta-Learning

## 01 Intro to Meta-Learning

### Motivation and some examples

When is standard machine learning not enough? Standard ML finally works for well-defined, stationary tasks. But how about the complex dynamic world, heterogeneous data from people and the interactive robotic systems?

### General formulation and probabilistic view

What is meta-learning? Standard learning: Given a distribution over examples (single task), learn a function that minimizes the loss: Learning-to-learn: Given a distribution over tasks, output an adaptation rule that can be used at test time to generalize from a task description

A Toy Example: Few-shot Image Classification

Other (practical) Examples of Few-shot Learning

### Gradient-based and other types of meta-learning

Model-agnostic Meta-learning (MAML) 与模型无关的元学习

• Start with a common model initialization $\theta$
• Given a new task $T_i$ , adapt the model using a gradient step:
• Meta-training is learning a shared initialization for all tasks:

Does MAML Work?

MAML from a Probabilistic Standpoint Training points: testing points: MAML with log-likelihood loss对数似然损失:

One More Example: One-shot Imitation Learning 模仿学习

Prototype-based Meta-learning Prototypes: Predictive distribution: Does Prototype-based Meta-learning Work?

Rapid Learning or Feature Reuse 特征重用

### Neural processes and relation of meta-learning to GPs

Drawing parallels between meta-learning and GPs In few-shot learning:

• Learn to identify functions that generated the data from just a few examples.
• The function class and the adaptation rule encapsulate our prior knowledge.

Recall Gaussian Processes (GPs): 高斯过程

• Given a few (x, y) pairs, we can compute the predictive mean and variance.
• Our prior knowledge is encapsulated in the kernel function.

Conditional Neural Processes 条件神经过程

On software packages for meta-learning A lot of research code releases (code is fragile and sometimes broken) A few notable libraries that implement a few specific methods:

• Torchmeta (https://github.com/tristandeleu/pytorch-meta)
• Learn2learn (https://github.com/learnables/learn2learn)

Takeaways

• Many real-world scenarios require building adaptive systems and cannot be solved using “learn-once” standard ML approach.
• Learning-to-learn (or meta-learning) attempts extend ML to rich multitask scenarios—instead of learning a function, learn a learning algorithm.
• Two families of widely popular methods:
• Gradient-based meta-learning (MAML and such)
• Prototype-based meta-learning (Protonets, Neural Processes, …)
• Many hybrids, extensions, improvements (CAIVA, MetaSGD, …)
• Is it about adaptation or learning good representations? Still unclear and depends on the task; having good representations might be enough.
• Meta-learning can be used as a mechanism for causal discovery.因果发现 (See Bengio et al., 2019.)

## 02 Elements of Meta-RL

### What is meta-RL and why does it make sense?

Recall the definition of learning-to-learn Standard learning: Given a distribution over examples (single task), learn a function that minimizes the loss： Learning-to-learn: Given a distribution over tasks, output an adaptation rule that can be used at test time to generalize from a task description Meta reinforcement learning (RL): Given a distribution over environments, train a policy update rule that can solve new environments given only limited or no initial experience.

Meta-learning for RL

### On-policy and off-policy meta-RL

On-policy RL: Quick Recap 符合策略的RL：快速回顾 REINFORCE algorithm:

On-policy Meta-RL: MAML (again!)

• Start with a common policy initialization $\theta$
• Given a new task $T_i$ , collect data using initial policy, then adapt using a gradient step:
• Meta-training is learning a shared initialization for all tasks: Adaptation as Inference 适应推理 Treat policy parameters, tasks, and all trajectories as random variables随机变量 meta-learning = learning a prior and adaptation = inference Off-policy meta-RL: PEARL

Key points:

• Infer latent representations z of each task from the trajectory data.
• The inference networkq is decoupled from the policy, which enables off-policy learning.
• All objectives involve the inference and policy networks.

Adaptation in nonstationary environments 不稳定环境 Classical few-shot learning setup:

• The tasks are i.i.d. samples from some underlying distribution.
• Given a new task, we get to interact with it before adapting.
• What if we are in a nonstationary environment (i.e. changing over time)? Can we still use meta-learning? Example: adaptation to a learning opponent Each new round is a new task. Nonstationary environment is a sequence of tasks.

• The tasks are sequentially dependent.
• meta-learn to exploit dependencies

Treat policy parameters, tasks, and all trajectories as random variables

RoboSumo: a multiagent competitive env an agent competes vs. an opponent, the opponent’s behavior changes over time

Takeaways

• Learning-to-learn (or meta-learning) setup is particularly suitable for multi-task reinforcement learning
• Both on-policy and off-policy RL can be “upgraded” to meta-RL:
• On-policy meta-RL is directly enabled by MAML
• Decoupling task inference and policy learning enables off-policy methods