CS285 DRL Notes-Lecture 2 Imitation Learning

This is the note of the Berkeley CS285 course taught by Sergey Levine. This is the second lecture about the imitation learning.

Terminology & notation

$t$: time step, discrete number
$\mathbf{o}_t$: observation, the input signals
$\mathbf{s}_t$: state, different from observation. This contains more essential information without noises in observation, like speed or locations
$\mathbf{a}_t$: action, could be discrete or continuous number
$\pi_{\theta}(\mathbf{a}_t|\mathbf{o}_t)$: policy provides us with an action to take. Distributions of over $\mathbf{a}_t$ given $\mathbf{o}_t$

An illustration of notations

Imitation Learning

Original deep imitation learning system
- ALVINN: Autonomous Land Vehicle In a Neural Network, 1989

ALVINN, 1989

Pipeline of previous autonomous driving system

These previous AD systems are easy to accumulate small errors and make some deviations. This accumulation comes from the i.i.d. assumption of training data.

Error accumulation of previous autonomous driving system

Distributional Shift Problem

Goal: minimize: ${E}{s_t \sim p{\pi_\theta}(s_t)}[c(s_t, a_t)]$

Lecture Notes

#DRL #introduction #CS285

CS285 DRL Notes-Lecture 2 Imitation Learning

https://jackyfl.github.io/JackYFL-blogs/2026/03/08/DRL-Berkeley-CS285-L2/

Author

JackYFL

Posted on

March 9, 2026

Licensed under