On nearly selfoptimizing strategies for multiarmed bandit problems with controlled arms

Ewa Drabik

doi:10.4064/am-23-4-449-473

Instytut Matematyczny Polskiej Akademii Nauk / Institute of Mathematics / Publishing house / Journals and Serials / Applicationes Mathematicae / All issues

Search for IMPAN publications

On nearly selfoptimizing strategies for multiarmed bandit problems with controlled arms

Volume 23 / 1996

Ewa Drabik Applicationes Mathematicae 23 (1996), 449-473 DOI: 10.4064/am-23-4-449-473

Abstract

Two kinds of strategies for a multiarmed Markov bandit problem with controlled arms are considered: a strategy with forcing and a strategy with randomization. The choice of arm and control function in both cases is based on the current value of the average cost per unit time functional. Some simulation results are also presented.

Authors

Ewa Drabik

Free download under CC-BY license

Search for IMPAN publications

Instytut Matematyczny Polskiej Akademii Nauk / Institute of Mathematics / Publishing house / Journals and Serials / Applicationes Mathematicae / All issues

Applicationes Mathematicae

On nearly selfoptimizing strategies for multiarmed bandit problems with controlled arms

Volume 23 / 1996

Abstract

Authors

Search for IMPAN publications

Rewrite code from the image