bandit 算法资料大全

来源:互联网 发布:希捷同步软件 编辑:程序博客网 时间:2024/05/16 02:05

算法介绍:

1.课程两节   Tutorial: Introduction to Bandits: Algorithms and Theory

    http://techtalks.tv/talks/54451/

    http://techtalks.tv/talks/54455/


2.博文介绍  Multi_armed bandit 

  https://mpatacchiola.github.io/blog/2017/08/14/dissecting-reinforcement-learning-6.html






toolbox:

1. Project details for pymabandits

    http://mloss.org/software/view/415/


2.Multi-Armed Bandit project  (version0.2  2005)  C#

   http://bandit.sourceforge.net/


3. bandit lib  (github  C++)

     https://github.com/jkomiyama/banditlib

      这个作者还有两个bandit算法库


    没有优化算法速度,支持 linux/GNU C++ environment. 不支持windows/MacOSX

   

  • Arms:
  • Binary and Normal distribution of rewards (arms) are implemented.
  • Policies:
  • DMED for binary rewards [1]
  • Epsilon-Greedy
  • KL-UCB [2]
  • MOSS [3]
  • Thompson sampling for binary rewards [4]
  • UCB [5]
  • UCB-V [6]

4.https://github.com/bgalbraith/bandits

Bandits

Python library for Multi-Armed Bandits

Implements the following algorithms:

  • Epsilon-Greedy
  • UCB1
  • Softmax
  • Thompson Sampling (Bayesian)
    • Bernoulli, Binomial <=> Beta Distributions


5. MOE 里面包括了bandit, 好像挺大的一个工具包
http://yelp.github.io/MOE/why_moe.html


6.libbandit

https://github.com/tor/libbandit

#LibBandit

LibBandit is a C++ library designed for efficiently simulating multi-armed bandit algorithms.

Currently the following algorithms are implemented:

  • UCB
  • Optimally confident UCB
  • Almost optimally confident UCB
  • Thompson sampling (Gaussian prior)
  • MOSS
  • Finite-horizon Gittins index (Gaussian/Gaussian model/prior)
  • An approximation of the finite-horizon Gittins index
  • Bayesian optimal for two arms (Gaussian/Gaussian model/prior)

算法程序(不是工具包)

算法图形化展示:

1.Bayesian Bandit Explorer

  https://learnforeverlearn.com/bandits/



2.Multi-armed Bandit Demo (steven用过)