Finite-Memory Strategies in POMDPs with Long-Run Average Objectives

@article{Chatterjee2019FiniteMemorySI,
  title={Finite-Memory Strategies in POMDPs with Long-Run Average Objectives},
  author={Krishnendu Chatterjee and Raimundo Saona and Bruno Ziliotto},
  journal={Math. Oper. Res.},
  year={2019},
  volume={47},
  pages={100-119},
  url={https://api.semanticscholar.org/CorpusID:233520786}
}
It is proved that in POMDPs with long-run average objective, the decision maker has approximately optimal strategies with finite memory, which implies notably that approximating the long- run value is recursively enumerable, as well as a weak continuity property of the value with respect to the transition function.

Figures from this paper

Ask This Paper
AI-Powered

Dynamic Random Access Without Observation Under Deadline-Constrained Periodic Traffic

This article focuses on random access in uplink systems under deadline-constrained periodic traffic, which is typical for many real-time Internet of Things scenarios and considers dynamic slotted ALOHA without observation where each active node adopts time-dependent but observation-independent transmission probabilities.

History-dependent evaluations in POMDPs

There exists a strategy that is epsilon-optimal for any sequence of weights satisfying a property that interprets as "the decision-maker is patient enough" in POMDPs with limsup payoffs.

Discrete-time controlled Markov processes with average cost criterion: a survey

This work is a survey of the average cost control problem for discrete-time Markov processes. The authors have attempted to put together a comprehensive account of the considerable research on this

Strong Uniform Value in Gambling Houses and Partially Observable Markov Decision Processes

In several standard models of dynamic programming, this shows that for any > 0, the decision-maker has a pure strategy σ which is-optimal in any n-stage problem, provided that n is big enough.

Long-Term Values in Markov Decision Processes and Repeated Games, and a New Distance for Probability Spaces

It is proved the existence of a very strong notion of long-term value called general uniform value, representing the fact that the decision maker can play well independently of the evaluations (𝜃t)t ≥ 1 over stages, provided the total variation (or impatience) is small enough.

Average Cost Dynamic Programming Equations For Controlled Markov Chains With Partial Observations

The value function for the average cost control of a class of partially observed Markov chains is derived as the "vanishing discount limit," in a suitable sense, of the value functions for the

On measurability and representation of strategic measures in Markov decision processes

. This paper deals with a discrete time Markov Decision Process with Borel state and action spaces. We show that the set of all strategic measures generated by randomized stationary policies is

BLACKWELL OPTIMALITY IN MARKOV DECISION PROCESSES WITH PARTIAL OBSERVATION

We prove the existence of Blackwell epsilon-optimal strategies in finite Markov Decision Processes with partial observation.

Stochastic games

The main existence results on optimality and equilibria in two- person stochastic games with finite state and action spaces are discussed and some algorithms for computing optimal strategies are presented.

Uniform value in Dynamic Programming

The main result says that if the set of states is a precompact metric space and the family (wm,n) is uniformly equicontinuous, then the uniform value exists.