Stationary optimal policies in a class of multichain positive dynamic programs with finite state space and risk-sensitive criterion
This work concerns Markov decision processes with finite state space and compact action sets. The decision maker is supposed to have a constant-risk sensitivity coefficient, and a control policy is graded via the risk-sensitive expected total-reward criterion associated with nonnegative one-step rewards. Assuming that the optimal value function is finite, under mild continuity and compactness restrictions the following result is established: If the number of ergodic classes when a stationary policy is used to drive the system depends continuously on the policy employed, then there exists an optimal stationary policy, extending results obtained by Schal (1984) for risk-neutral dynamic programming. We use results recently established for unichain systems, and analyze the general multichain case via a reduction to a model with the unichain property.