The BeNeRL Seminar Series are monthly online talks by RL researchers from all over the world. The intention is to primarily give a platform to advanced PhD students and early career researchers to 1) display their work and 2) share their practical RL experience (i.e., how do you manage large-scale RL experiments as a new researcher in the field, a topic that is often skipped in talks). We maintain a summary of the main advice on experimentation.
The seminar is online and takes place on every second Thursday of the month, 16.00-17.00 (CET)
(unless there is a conflict with an important machine learning conference, when we try to shift by one week)
Date: Feb. 12, 16.00-17.00 (Amsterdam Time Zone)
Title: On-policy value learning at 10000 frames per second
Link: Zoom Link - Click Here
Abstract: When samples are cheap and fast to collect, the RL community relies on the policy-gradient theorem to obtain agents which can reliably train on massive amounts of data. However, since their inception, zeroth-order algorithms such as REINFORCE, TRPO, and PPO have been plagued by high variance, which makes them hard to tune. In the off-policy regime where sample efficiency is the goal, stable and efficient value-driven methods have been explored, but these require replay buffers and specialized architectures to stabilize off-policy learning. What if we could bridge the two paradigms, and bring stable value-driven learning to the on-policy sampling regime? In our new ICLR paper, Relative Entropy Policy Optimization, we explore how to achieve stable on-policy value function learning at 50000 frames per second. We will see that value-learning is possible, and useful, without the use of massive replay buffers by combining insights from both the on-policy and off-policy literature.
Talks are always online and take place between 16.00-17.00 (Amsterdam Time Zone).
2023
Thu Oct 12: Benjamin Eysenbach (Princeton) Connections between Reinforcement Learning and Representation Learning
Thu Nov 16: Cansu Sancaktar (Max Planck Institute) Playful Exploration in Reinforcement Learning
2024
Thu Feb 8: Pierluca D'Oro (Mila) On building World Models better than reality
Thu April 11: Minqi Jiang (Google Deepmind) Learning Curricula in Open-Ended Worlds
Thu May 16: Edward Hu (University of Pennsylvania) The Sensory Needs of Robot Learners
Thu June 13: Nicklas Hansen (UC San Diego) Data-Driven World Models for Robots
Thu Sep 12: Daniel Palenicek (TU Darmstadt) Sample Efficiency in Deep RL: Quo Vadis? (slides)
Thu Oct 10: Ademi Adeniji (UC Berkeley) Reinforcement Learning Behavioral Generalists - Top-Down and Bottom-Up (slides)
Thu Nov 14: Tal Daniel (Technion) Particles to Policies: Object-Centric Learning in Pixel-Based Decision Making (slides)
Thu Dec 19: Hojoon Lee (KAIST AI) Designing Neural Network Architecture for Deep Reinforcement Learning (slides)
2025
Thu Mar 13: Andrea Tirinzoni (Meta FAIR) Pre-training Behavioral Foundation Models via Zero-shot Reinforcement Learning
Thu May 8: Qiyang Li (UC Berkeley) Leveraging unlabeled task-agnostic offline data for efficient online exploration
Thu June 12: Anikait Singh (Stanford University) Towards Scalable RL Machinery for LLM Post-Training
Thu Dec 11: Theresa Eimer (Leibniz University of Hannover) "Is My RL Algorithm a Good Tool?" - What Evaluation Strategies Tell Us About Our Algorithms
2026
Thu Jan 8: Andrew Wagenmaker (UC Berkeley) What Does RL Theory Have to Do with Robotics?
Thu Feb 12: Claas A. Voelcker (UT Austin) On-policy value learning at 10000 frames per second (slide)
Thu March 12: Elle Miller (University of Edinburgh) TBD
Thu April 9:
If you have any questions about the seminar series, feel free to contact:
Zhao Yang: z.yang(at)liacs.leidenuniv.nl