On adaptive control of a partially observed Markov chain
A control problem for a partially observable Markov chain depending on a parameter with long run average cost is studied. Using uniform ergodicity arguments it is shown that, for values of the parameter varying in a compact set, it is possible to consider only a finite number of nearly optimal controls based on the values of actually computable approximate filters. This leads to an algorithm that guarantees nearly selfoptimizing properties without identifiability conditions. The algorithm is based on probing control, whose cost is additionally assumed to be periodically observable.