1. 首页
  2. 开发者
  3. 机器学习

强化学习从入门到放弃:汪军李宏毅等大佬资源大放送!

强化学习从入门到放弃:汪军李宏毅等大佬资源大放送!

【导读】本文整理了从入门级到高级的强化学习资料,包括书籍和课程,包括李宏毅、汪军等大牛的宝贵资料。望读者能从中受益。

目录

一、书

  • [Reinforcement Learning: An Introduction](#Reinforcement Learning: An Introduction )

  • [Algorithms for Reinforcement Learning](#Algorithms for Reinforcement Learning)

  • OpenAI-spinningup

二、课程

1、基础课程

  • [Rich Sutton 强化学习课程(Alberta)](#Rich Sutton 强化学习课程(Alberta))

  • [David Silver 强化学习课程(UCL)](#David Silver 强化学习课程(UCL))

  • [Stanford 强化学习课程](#Stanford 强化学习课程)

  • [UCL + STJU Multi-Agent Reinforcement Learning Tutorial](#Multi-Agent Reinforcement Learning Tutorial)

2、深度DRL课程

  • [台湾大学 李宏毅 (深度)强化学习](#台湾大学 李宏毅 (深度)强化学习)

  • [UCB 深度强化学习课程](#UCB 深度强化学习课程)

  • [CMU 深度强化学习课程](#CMU 深度强化学习课程)

Reinforcement Learning: An Introduction

Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction update 第二版的最终版(点击obline draft)�: link,因为官方的是放在google doc上,所以我就下载了一个放在github上,需要自取。

注:已经可以准备买实体书了,和同学各自海淘了一本,还没有到手 — 国外亚马逊, 国内的话,可以考虑JD和国内的亚马逊–不过会贵一些。

Algorithms for Reinforcement Learning

Csaba Szepesvari, Algorithms for Reinforcement Learning

链接:https://sites.ualberta.ca/~szepesva/papers/RLAlgsInMDPs.pdf

OpenAI-spinningup

这个算是比较杂的书吧,有在线doc+对应的code+对应的练习(非常建议结合UCL的一起看,我大致过了一遍,蛮不错的。 *

但是没有提到下面的UCL,UCB的课,也没有提到上面sutton的书,结合得看或许会更好:

在线的文档:

http://spinningup.openai.com/en/latest/

关于强化学习的基础介绍:http://spinningup.openai.com/en/latest/spinningup/rl_intro.html

关于深度强化学习的建议:http://spinningup.openai.com/en/latest/spinningup/spinningup.html

代码部分:

https://github.com/openai/spinningup/tree/master/spinup

课程

基础课程

课程主页

链接:http://incompleteideas.net/rlai.cs.ualberta.ca/RLAI/RLAIcourse/RLAIcourse2006.html

这个比较老了,有一个比较新的在google云盘上,我找个时间整理一下。

David Silver 强化学习课程(UCL)

注:这是David Silver大神2015在UCL开的课,现在感觉已经在DeepMind走向巅峰了,估计得等他那天想回学校培养学生才可能开出新的课吧。非常推荐入门学习,建立基础的RL概念。

课程主页:http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html

对应slide(课件): 

Lecture 1: Introduction to Reinforcement Learning

http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/intro_RL.pdf

Lecture 2: Markov Decision Processes 

http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MDP.pdf

Lecture 3: Planning by Dynamic Programming

http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/DP.pdf 

Lecture 4: Model-Free Prediction link

http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MC-TD.pdf

Lecture 5: Model-Free Control link

http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/control.pdf

Lecture 6: Value Function Approximation link

http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/FA.pdf

Lecture 7: Policy Gradient Methods link

http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/pg.pdf

Lecture 8: Integrating Learning and Planning link

http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/dyna.pdf

Lecture 9: Exploration and Exploitation link

http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/XX.pdf

Lecture 10: Case Study: RL in Classic Games link

http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/games.pdf

Stanford 强化学习课程

注:为2018 spring的课

课程主页:http://web.stanford.edu/class/cs234/schedule.html

对应slide(课件):

 Introduction to Reinforcement Learning

http://web.stanford.edu/class/cs234/slides/cs234_2018_l1.pdf

How to act given know how the world works. Tabular setting. Markov processes. Policy search. Policy iteration. Value iteration

http://web.stanford.edu/class/cs234/slides/cs234_2018_l2.pdf

Learning to evaluate a policy when don’t know how the world works.

http://web.stanford.edu/class/cs234/slides/cs234_2018_l3.pdf

Model-free learning to make good decisions. Q-learning. SARSA.

http://web.stanford.edu/class/cs234/slides/cs234_2018_l4.pdf

Scaling up: value function approximation. Deep Q Learning.

http://web.stanford.edu/class/cs234/slides/cs234_2018_l5.pdf

Deep reinforcement learning continued. 

http://web.stanford.edu/class/cs234/slides/cs234_2018_l6.pdf

Imitation Learning.

http://web.stanford.edu/class/cs234/slides/cs234_2018_l7_annotated.pdf

Policy search.

http://web.stanford.edu/class/cs234/slides/cs234_2018_l8.pdf

Policy search. 

http://web.stanford.edu/class/cs234/slides/cs234_2018_l9_updated.pdf

Midterm review.

http://web.stanford.edu/class/cs234/slides/cs234_2018_midterm_review.pdf

Fast reinforcement learning (Exploration/Exploitation) Part I.

http://web.stanford.edu/class/cs234/slides/cs234_2018_l11.pdf

Fast reinforcement learning (Exploration/Exploitation) Part II. 

http://web.stanford.edu/class/cs234/slides/cs234_2018_l12.pdf

Batch Reinforcement Learning. 

http://web.stanford.edu/class/cs234/slides/cs234_2018_l13.pdf

Monte Carlo Tree Search.

http://web.stanford.edu/class/cs234/slides/cs234_2018_l14.pdf

Human in the loop RL with a focus on transfer learing.

http://web.stanford.edu/class/cs234/slides/cs234_2018_l15.pdf

Multi-Agent Reinforcement Learning Tutorial

注:因为在阿里广告这边实习,有幸和汪老师还有张老师做了篇论文。在过程中体会到汪老师的思维真的很活跃,很强。另外,张老师感觉是国内cs冉冉升起的新星,值得follow和关注!

课程主页:

http://wnzhang.net/tutorials/marl2018/index.html

Fundamentals of Reinforcement Learning

http://wnzhang.net/tutorials/marl2018/docs/lecture-1-rl.pdf

Fundamentals of Game Theory

http://wnzhang.net/tutorials/marl2018/docs/lecture-2a-game-theory.pdf

Learning in Repeated Games

http://wnzhang.net/tutorials/marl2018/docs/lecture-2b-repeated-games.pdf

Multi-Agent Reinforcement Learning

http://wnzhang.net/tutorials/marl2018/docs/lecture-3a-marl-1.pdf

深度DRL课程

台湾大学 李宏毅 (深度)强化学习

课程主页:

http://speech. ee.ntu.edu.tw/~tlkagk/courses/

视频可以在B站上看到:

https://www.bilibili.com/video/av24724071?from=search&seid=14814651069494196110

UCB 深度强化学习课程

课程主页:

http://rail.eecs.berkeley.edu/deeprlcourse/

Lecture Slides See Syllabus for more information.

细节部分详情请见原文中的链接:

https://github.com/wwxFromTju/awesome-reinforcement-learning-zh#%E4%B9%A6

CMU 深度强化学习课程

update fall 2018

2018 fall 的课程主页:

http://www.andrew.cmu.edu/course/10-703/

2017的课程主页:

https://katefvision.github.io/

对应slide(课件):

 Introduction

https://www.cs.cmu.edu/~katef/DeepRLFall2018/lecture1_intro.pdf

Markov decision processes (MDPs), POMDPs

https://www.cs.cmu.edu/~katef/DeepRLFall2018/lecture2_mdps.pdf

Solving known MDPs: Dynamic Programming

http://www.andrew.cmu.edu/course/10-703/slides/lecture3_exactmethods-9-5-2018.pdf

Policy iteration, Value iteration, Asynchronous DP

http://www.andrew.cmu.edu/course/10-703/slides/lecture4_valuePolicyDP-9-10-2018.pdf

Monte Carlo Learning, Temporal difference learning, Q learning

http://www.andrew.cmu.edu/course/10-703/slides/Lecture5_MC_9-12-2018.pdf

Temporal difference learning (Tom), Planning and learning: Dyna, Monte carlo tree search

http://www.andrew.cmu.edu/course/10-703/slides/TDshort-9-17-2018.pdf

Deep NN Architectures for RL

https://www.cs.cmu.edu/~katef/DeepRLFall2018/lecture_NNarchitecturesforRL_katef.pdf

Recitation on Monte Carlo Tree Search

https://www.cs.cmu.edu/~katef/DeepRLFall2018/MCTS_katef.pdf

VF approximation, MC, TD with VF approximation, Control with VF approximation

https://www.cs.cmu.edu/~katef/DeepRLFall2018/lecture_FAkatef.pdf

Deep Q Learning : Double Q learning, replay memory

https://www.cs.cmu.edu/~katef/DeepRLFall2018/lecture_DQL_katef2018.pdf

Advanced Policy Gradients

http://www.andrew.cmu.edu/course/10-703/slides/Lecture_PG-NatGrad-10-8-2018.pdf

Evolution Methods, Natural Gradients

http://www.andrew.cmu.edu/course/10-703/slides/Lecture_async_evolution.pdf

Natural Policy Gradients, TRPO, PPO, ACKTR

http://www.andrew.cmu.edu/course/10-703/slides/Lecture_NaturalPolicyGradientsTRPOPPO.pdf

Pathwise Derivatives, DDPG, multigoal RL, HER

http://www.andrew.cmu.edu/course/10-703/slides/Lecture_DDPGMultigoalRL.pdf

Exploration vs. Exploitation

http://www.andrew.cmu.edu/course/10-703/slides/Lecture_Exploration-10-22-2018.pdf

Exploration and RL in Animals

http://www.andrew.cmu.edu/course/10-703/slides/Lecture_exploration.pdf

Model-based Reinforcement Learning

http://www.andrew.cmu.edu/course/10-703/slides/Lecture_modelbasedRL.pdf

Imitation Learning

http://www.andrew.cmu.edu/course/10-703/slides/Lecture_Imitation_supervised-Nov-5-2018.pdf

Maximum Entropy Inverse RL, Adversarial imitation learning

http://www.andrew.cmu.edu/course/10-703/slides/Lecture_IRL_GAIL.pdf

Recitation: Trajectory optimization – iterative LQR

https://katefvision.github.io/katefSlides/RECITATIONtrajectoryoptimization_katef.pdf

原文链接:

https://github.com/wwxFromTju/awesome-reinforcement-learning-zh#%E4%B9%A6

本文来自投稿,不代表乌云网立场,如若转载,请注明出处:http://www.aiwuyun.net/archives/2422.html

发表评论

登录后才能评论

联系我们

在线咨询:点击这里给我发消息

邮件:admin@aiwuyun.net

工作时间:周一至周五,9:30-18:30,节假日休息