Uczenie ze Wzmocnieniem

dr hab. Piotr Miłoś, dr Łukasz Kuciński

Poniedziałek, 10.00 - 12.00 sala 106

Seminarium zakończyło prace w semestrze letnim 2019.

01.04.2019, Michał Zawalski, Visual Hindsight Experience Replay.

Abstract: Reinforcement learning algorithms usually require millions of interactions with environment to learn successful policy. Hindsight Experience Replay was introduced as a technique to learn from unsuccessful episodes and thus improve sample efficiency. However it cannot be directly applied to visual domains. I will show a modification of this approach called Visual Hindsight Experience Replay, which aims to solve this issue. The key part of this approach is a method of fooling the agent into thinking that it has actually reached the goal in a sampled unsuccessful episode.

25.03.2019Andrzej Nagórko, Parallelized Nested Rollout Policy Adaptation.

Abstract: Nested Rollout Policy Adaptation (NRPA) is a Monte Carlo
tree search algorithm. It beats more general Monte Carlo tree search
algorithms in the domain of single agent optimization problems. I'll
show how to parallelize NRPA and discuss performance of the parallel
version in the Morpion Solitaire benchmark.

18.03.2019, Seminarium się nie odbędzie.

11.03.2019, Piotr Kozakowski, Discrete Autoencoders: Gumbel-Softmax vs Improved Semantic Hashing.

Abstract: Gumbel-softmax (Jang et al - Categorical Reparameterization with Gumbel-Softmax, 2016) and improved semantic hashing (Kaiser et al - Discrete Autoencoders for Sequence Models, 2018) are two approaches to relaxation of discrete random variables that can be used to train autoencoders with discrete latent representations. They have not yet been rigorously compared in domains other than language modeling. I will start by describing the two methods and the original results. Then I will analyze their performance, both qualitatively and quantitatively, in an image generation task. I will end with sharing some practical considerations learned while implementing those methods.

04.03.2019Jakub Świątkowski, Deep Reinforcement Learning based on Zambaldi, et. al. "Deep reinforcement learning with relational inductive biases".

Abstract: We will talk about relational deep reinforcement learning, which was applied to train AlphaStar, as described in Zambaldi, et. al. "Deep reinforcement learning with relational inductive biases".

25.02.2019, Łukasz Kuciński, Neural Expectation Maximization, based on Greff, et. al. “Neural Expectation Maximization”.

Abstract: We will talk about the classical Expectation Maximization algorithm and its differentiable counterpart, as described in Greff, et. al. “Neural Expectation Maximization”.

18.02.2019Konrad Czechowski, Universal Planning Networks, based on Srinivas et. al. "Universal Planning Networks".

Abstract: As authors write "A key challenge in complex visuomotor control is learning abstract representations that are effective for specifying goals, planning, and generalization". I'll present how the proposed method, Universal Planning Networks, provides promising results in these directions.

11.02.2019Błażej Osiński Goal-conditioned hierarchical reinforcement learning (based on “Data-Efficient Hierarchical Reinforcement Learning”, Nachum et al and “Near-Optimal Representation Learning for Hierarchical Reinforcement Learning” Nachum et al).

Abstract: Humans naturally plan and execute actions in hierarchical fashion - when one plans to go somewhere, they don’t think about every foot step on the way. This hints at using hierarchical methods also in the context of reinforcement learning. Though the idea seems to be obvious, these methods were rarely successfully applied to complex environments. In the presentation, I’ll focus on goal-conditioned methods, which seem to convincingly apply hierarchical RL methods to learn highly complex behaviours.

04.02.2019, Krzysztof Galias, Adam Jakubowski, RL for autonomous driving: A case study.

Abstract: We will go over Reinforcement Learning project for a big automotive company where the goal is to train a car driving policy in a simulator and transfer it to the real world. We will discuss techniques used, lessons learned and share progress on the task.

28.01.2019Karol Strzałkowski, Abstract representation learning (based on 'Decoupling Dynamics and Reward for Transfer Learning', Zhang et al and 'Combined Reinforcement Learning via Abstract Representations', Francois-Lavet et al).

Abstract: There are several reasons to try to mix model-based and model-free approaches in reinforcement learning. While in many cases model-free approaches perform better than planning using a model of the environment, a good state space representation might lead to better sample efficiency and easier transfer learning. The authors of the first paper propose such method of learning an abstract environment representation in a modular way, which supports transferability in many ways. The authors of the latter improve this setting and obtain even better sample efficiency and interpretability of the learned representation.

21.01.2019, Seminarium się nie odbędzie.

14.01.2019, Piotr Miłoś, Dr Uncertainty or: How I Learned to Stop Worrying and Love the Ensembles.

Abstract: Though measuring uncertainty is a fundamental idea in statistics it has been somewhat absent in deep learning. One of the major obstacles has been lack of efficient Bayesian learning. While still not fully resolved promising works emerged recently. In my talk I will give a non-exhaustive overview starting with papers:

- Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

- Randomized Prior Functions for Deep Reinforcement Learning

07.01.2019, Mikołaj Błaż, Policy-guided tree search na bazie pracy Orseau et. al. "Single-Agent Policy Tree Search With Guarantees".

Abstract: Tree search is a standard and continuously investigated task of Artificial Intelligence. In the first part of the talk I will briefly discuss some common tree search algorithms. Second part will focus on the "Single-Agent Policy Tree Search With Guarantees" paper. Its authors propose two novel policy-guided tree search algorithms with provable upper bound on the number (or expected number) of tree nodes expanded before reaching a goal state. Algorithms are then analyzed and evaluated on Sokoban environment.


31.12.2018, Seminarium się nie odbędzie.

24.12.2018, Seminarium się nie odbędzie.

17.12.2018, Maciej Klimek, Konrad Czechowski, Maciej Jaśkowski, Łukasz Kuciński, Podsumowanie konferencji NeurIPS 2018.

10.12.2018, Michał Zawalski, Learning to navigate.

03.12.2018, Seminarium się nie odbędzie.

26.11.2018, Piotr Kozakowski, Exploration by Random Network Distillation na bazie  pracy Burda et. al. "Exploration by Random Network Distillation".

Abstract: Eksploracja przez Destylację Losowych Sieci to nowa metoda, która uzyskała godne uwagi wyniki na grze Atari Montezuma's Revenge. Zacznę od opisu gry i trudności które się z nią wiążą, w szczególności związanych z eksploracją. Wprowadzę też problem eksploracji i pewne ogólne metody radzenia sobie z nim. Następnie opiszę Destylację Losowych Sieci jako mechanizm eksploracji. Podam pewne podstawowe intuicje i postaram się uzasadnić metodę używając argumentów z Bayesowskiego Głębokiego Uczenia. Potem podam szczególy techniczne eksperymentów autorów z metodą i zakończę opisem wyników.


19.11.2018, Łukasz Krystoń, omówienie prac Oh et. al. "Self-Imitation Learning" oraz "Contingency-Aware Exploration in Reinforcement Learning".

12.11.2018, Seminarium się nie odbędzie.

05.11.2018, Łukasz Kuciński, Piotr Miłoś, Omówienie programu seminarium.

Przepisz kod z obrazka

Odśwież obrazek

Odśwież obrazek