DeepRL · 2020年07月02日

Advanced Topics in Deep Reinforcement learning 开课啦


【高级强化学习课程+项目】Advanced Topics in Deep Reinforcement learning开课啦!



RL#1: 13.02.2020: Exploration in RL

Sergey Ivanov

  • Random Network Distillation [1]
  • Intrinsic Curiosity Module [2,3]
  • Episodic Curiosity through Reachability [4]

RL#2: 20.02.2020: Imitation and Inverse RL

Just Heuristic

  • Imitation Learning[5]
  • Inverse RL [6,7]
  • Learning from Human Preferences [8]

RL#3: 27.02.2020: Hierarchical Reinforcement Learning

Petr Kuderov

  • A framework for temporal abstraction in RL [9]
  • The Option-Critic Architecture [10]
  • FeUdal Networks for Hierarchical RL [11]
  • Data-Efficient Hierarchical RL [12]
  • Meta Learning Shared Hierarchies [13]

RL#4: 5.03.2020: Evolutionary Strategies in RL

Evgenia Elistratova

  • A framework for temporal abstraction in reinforcement learning [14]
  • Improving Exploration in Evolution Strategies for Deep RL [15]
  • Paired Open-Ended Trailblazer (POET) [16]
  • Sim-to-Real: Learning Agile Locomotion For Quadruped Robots [17]

RL#5: 12.03.2020: Distributional Reinforcement Learning

Pavel Shvechikov

  • A Distributional Perspective on RL [18]
  • Distributional RL with Quantile Regression [19]
  • Implicit Quantile Networks for Distributional RL [20]
  • Fully Parameterized Quantile Function for Distributional RL [21]

RL#6: 19.03.2020:RL for Combinatorial optimization

Taras Khakhulin

  • RL for Solving the Vehicle Routing Problem [22]
  • Attention, Learn to Solve Routing Problems! [23]
  • Learning Improvement Heuristics for Solving the Travelling Salesman Problem [24]
  • Learning Combinatorial Optimization Algorithms over Graphs [25]

RL#7: 26.03.2020: RL as Probabilistic Inference

Pavel Termichev

  • RL and Control as Probabilistic Inference: Tutorial and Review [26]
  • RL with Deep Energy-Based Policies [27]
  • Soft Actor-Critic [28]
  • Variational Bayesian RL with Regret Bounds [29]

RL#8: 9.04.2020: Multi Agent Reinforcement Learning

Sergey Sviridov

  • Stabilising Experience Replay for Deep Multi-Agent RL [30]
  • Counterfactual Multi-Agent Policy Gradients [31]
  • Value-Decomposition Networks For Cooperative Multi-Agent Learning [32]
  • Monotonic Value Function Factorisation for Deep Multi-Agent RL [33]
  • Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments [34]

RL#9: 16.04.2020:  Model-Based Reinforcement Learning

Evgeny Kashin

  • DL for Real-Time Atari Game Play Using Offline MCTS Planning [35]
  • Mastering Chess and Shogi by Self-Play with a General RL Algorithm [36]
  • World Models [37]
  • Model-Based RL for Atari [38]
  • Learning Latent Dynamics for Planning from Pixels [39]

RL#10: 23.04.2020: Reinforcement Learning at Scale

Aleksandr Panin

  • Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour [40]
  • HOGWILD!: A Lock-Free Approach to Parallelizing SGD [41]
  • GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism [42]
  • Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism [43]
  • Learning@home: Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts [44]

RL#11: 30.04.2020: Multitask & Transfer RL

Dmitry Nikulin

  • Universal Value Function Approximators [45]
  • Hindsight Experience Replay [46]
  • PathNet: Evolution Channels Gradient Descent in Super Neural Networks [47]
  • Progressive Neural Networks [48]
  • Learning an Embedding Space for Transferable Robot Skills [49]

RL#12: 07.05.2020: Memory in Reinforcement Learning

Artyom Sorokin

  • Recurrent Experience Replay in Distributed RL [50]
  • AMRL: Aggregated Memory For RL [51]
  • Unsupervised Predictive Memory in a Goal-Directed Agent [52]
  • Stabilizing Transformers for RL [53]
  • Model-Free Episodic Control [54]
  • Neural Episodic Control [55]

RL#13: 14.05.2020: Distributed RL In the wild

Sergey Kolesnikov

  • Asynchronous Methods for Deep RL [56]
  • IMPALA: Scalable Distributed DRL with Importance Weighted Actor-Learner Architectures [57]
  • Distributed Prioritized Experience Replay [58]
  • Making Efficient Use of Demonstrations to Solve Hard Exploration Problems [59]
  • SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference [60]


【1】Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives (Hierarchical RL)

Implement the paper on the test environment of your choice.

【2】 HIRO with Hindsight Experience Replay (Hierarchical RL)

Add Hindsight experience replay to the HIRO algorithm.Compare with HIRO.

【3】 Meta Learning Shared Hierarchies on pytorch (Hierarchical RL)

Implement the paper with pytorch (author's implementation uses TF). Check its results on the test environment of your choice (not from the paper).

【4】Fast deep Reinforcement learning using online adjustments from the past (Memory in RL)

Try to reproduce the paper or implement the algorithm on a different environment.

_Bonus points:
* Comparison with the NEC or a basic DRL algorithm;
* Ablation study._

【5】Episodic Reinforcement Learning with Associative Memory (Memory in  RL)

Try to reproduce the paper or implement the algorithm on a different environment.

_Bonus points:
* Comparison with the NEC or a basic DRL algorithm;
* Ablation study._

【6】Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization (Inverse RL)

Implement the algorithm and test it on Atari games. Compare results with common baselines.

【7】Non-Monotonic Sequential Text Generation on TF/chainer (Imitation Learning)

Implement the paper on tensorflow or chainer.

【8】Evolution Strategies as a Scalable Alternative to Reinforcement      Learning (Evolution Strategies)

Implement the algorithm and test it on vizdoom or gym-minigrid. Compare resluts with available baselines.

【9】Improving Exploration in Evolution Strategies for DRL via a Population of Novelty-Seeking Agents (Evolution Strategies)

 Implement the algorithm and test it on vizdoom or gym-minigrid. Compare resluts with available baselines.

【10】Comparative study of intrinsic motivations (Exploration in RL)

Using MountainCar-v0 compare:

1) curiosity on forward dynamics model loss;
2) curiosity on inverse dynamics model loss;
3) ICM;
4) RND.
_Bonus points:
* Add motivation for off-policy RL algorithm (e.g. DQN or QR-DQN);
* Try MountainCarContinuous-v0._

【11】Solving Unity Pyramids (Exploration in RL)

Try to reproduce this experiment using any intrinsic motivation you like.

【12】RND Exploratory Behavior (Exploration in RL)

There was a study of exploratory behaviors for curiosity-based intrinsic motivation. Choose any environment, e.g. some Atari game, and discover exploratory behavior of RND.

【13】 Learning Improvement Heuristics for Solving the Travelling Salesman   Problem (RL for Combinatorial Opt.)

Implement the paper on any combinatorial opt. problem you like. Compare with avialable solvers.

【14】Dynamic Attention Model for Vehicle Routing Problems (RL for Combinatorial Opt.)

Implement the paper on any combinatorial opt. problem you like. Compare   with avialable solvers.

【15】Variational RL with Regret Bounds (Variational RL)

Try to reproduce K-learning algorithm from the paper. Pick a finite discrete environment of your choice. Use this paper as an addition to the main one.

_Bonus points:
* Compare with exact version of soft actor-critic or soft q-learning from here. Hint: use message-passing algorithm;
* Propose approximate K-learning algorithm with the use of function approximators (neural networks)._






更多深度强化学习精选知识请关注深度强化学习实验室专栏,投稿请联系微信 1946738842.
5 阅读 464
0 条评论
实时获取免费 Arm 教学资源信息
实时获取 Arm 中国职位信息