Finite-Memory Strategies in POMDPs with Long-Run Average Objectives

K. Chatterjee; Raimundo Saona; Bruno Ziliotto

DOI:10.1287/moor.2020.1116
Corpus ID: 233520786

Finite-Memory Strategies in POMDPs with Long-Run Average Objectives

@article{Chatterjee2019FiniteMemorySI,
  title={Finite-Memory Strategies in POMDPs with Long-Run Average Objectives},
  author={Krishnendu Chatterjee and Raimundo Saona and Bruno Ziliotto},
  journal={Math. Oper. Res.},
  year={2019},
  volume={47},
  pages={100-119},
  url={https://api.semanticscholar.org/CorpusID:233520786}
}

K. ChatterjeeRaimundo SaonaBruno Ziliotto
Published in Mathematics of Operations… 30 April 2019
Mathematics, Computer Science

It is proved that in POMDPs with long-run average objective, the decision maker has approximately optimal strategies with finite memory, which implies notably that approximating the long- run value is recursively enumerable, as well as a weak continuity property of the value with respect to the transition function.

[PDF] Semantic Reader

2 Citations

Background Citations

Figures from this paper

Topics

POMDPs Long-Run Average Objectives Partially Observable Markov Decision Processes Recursively Enumerable Finite-memory Strategy

Ask This Paper
BETA
AI-Powered

Our system tries to constrain to information found in this paper. Results quality may vary. Learn more about how we generate these answers.

Feedback?

Dynamic Random Access Without Observation Under Deadline-Constrained Periodic Traffic

Aoyu GongYuan-Hsun LoYan LinYijin Zhang

Computer Science, Engineering

IEEE Transactions on Vehicular Technology

2024

This article focuses on random access in uplink systems under deadline-constrained periodic traffic, which is typical for many real-time Internet of Things scenarios and considers dynamic slotted ALOHA without observation where each active node adopts time-dependent but observation-independent transmission probabilities.

Artificial enactive inference in three-dimensional world

O. GeorgeonDavid LuriePaul Robertson

Computer Science, Psychology

Cognitive Systems Research

2024

History-dependent evaluations in POMDPs

Xavier VenelBruno Ziliotto

Mathematics, Economics

2020

There exists a strategy that is epsilon-optimal for any sequence of weights satisfying a property that interprets as "the decision-maker is patient enough" in POMDPs with limsup payoffs.

[PDF]

Discrete-time controlled Markov processes with average cost criterion: a survey

A. ArapostathisV. BorkarE. Fernández-GaucherandM. K. GhoshS. Marcus

Mathematics

1993

This work is a survey of the average cost control problem for discrete-time Markov processes. The authors have attempted to put together a comprehensive account of the considerable research on this…

Strong Uniform Value in Gambling Houses and Partially Observable Markov Decision Processes

Xavier VenelBruno Ziliotto

Mathematics, Economics

SIAM J. Control. Optim.

2016

In several standard models of dynamic programming, this shows that for any > 0, the decision-maker has a pure strategy σ which is-optimal in any n-stage problem, provided that n is big enough.

Long-Term Values in Markov Decision Processes and Repeated Games, and a New Distance for Probability Spaces

Jérôme RenaultXavier Venel

Mathematics

Math. Oper. Res.

2017

It is proved the existence of a very strong notion of long-term value called general uniform value, representing the fact that the decision maker can play well independently of the evaluations (𝜃t)t ≥ 1 over stages, provided the total variation (or impatience) is small enough.

Average Cost Dynamic Programming Equations For Controlled Markov Chains With Partial Observations

V. Borkar

Mathematics

SIAM J. Control. Optim.

2000

The value function for the average cost control of a class of partially observed Markov chains is derived as the "vanishing discount limit," in a suitable sense, of the value functions for the…

On measurability and representation of strategic measures in Markov decision processes

E. Feinberg

Mathematics

1996

. This paper deals with a discrete time Markov Decision Process with Borel state and action spaces. We show that the set of all strategic measures generated by randomized stationary policies is…

BLACKWELL OPTIMALITY IN MARKOV DECISION PROCESSES WITH PARTIAL OBSERVATION

Dinah RosenbergEilon SolanN. Vieille

Mathematics

2002

We prove the existence of Blackwell epsilon-optimal strategies in finite Markov Decision Processes with partial observation.

On the undecidability of probabilistic planning and related stochastic optimization problems

Omid MadaniS. HanksA. Condon

Computer Science

Artif. Intell.

2003

Stochastic games

K. Chatterjee

Mathematics, Computer Science

Game Theory

2019

The main existence results on optimality and equilibria in two- person stochastic games with finite state and action spaces are discussed and some algorithms for computing optimal strategies are presented.

Uniform value in Dynamic Programming

Jérôme Renault

Computer Science, Mathematics

2008

The main result says that if the set of states is a precompact metric space and the family (wm,n) is uniformly equicontinuous, then the uniform value exists.

[PDF]

Finite-Memory Strategies in POMDPs with Long-Run Average Objectives

Figures from this paper

Topics

Ask This PaperBETAAI-Powered

2 Citations

Dynamic Random Access Without Observation Under Deadline-Constrained Periodic Traffic

Artificial enactive inference in three-dimensional world

28 References

History-dependent evaluations in POMDPs

Discrete-time controlled Markov processes with average cost criterion: a survey

Strong Uniform Value in Gambling Houses and Partially Observable Markov Decision Processes

Long-Term Values in Markov Decision Processes and Repeated Games, and a New Distance for Probability Spaces

Average Cost Dynamic Programming Equations For Controlled Markov Chains With Partial Observations

On measurability and representation of strategic measures in Markov decision processes

BLACKWELL OPTIMALITY IN MARKOV DECISION PROCESSES WITH PARTIAL OBSERVATION

On the undecidability of probabilistic planning and related stochastic optimization problems

Stochastic games

Uniform value in Dynamic Programming

Related Papers

Ask This Paper
BETA
AI-Powered