On Constant Regret for Low-Rank MDPs

On Constant Regret for Low-Rank MDPs

Abstract

Although there exist instance-dependent regret bounds for linear Markov decision processes (MDPs) and low-rank bandits, extensions to low-rank MDPs remain unexplored. In this work, we close this gap and provide regret bounds for low-rank MDPs in an instance-dependent setting. Specifically, we introduce an algorithm, called UNISREP-UCB, which utilizes a constrained optimization objective to learn features with good spectral properties. Furthermore, we demonstrate that our algorithm enjoys constant regret if the minimal sub-optimality gap and the occupancy distribution of the optimal policy are well-defined and known. To the best of our knowledge, these are the first instance-dependent regret results for low-rank MDPs.

Grafik Top
Authors
  • Sturm, Alexander
  • Tschiatschek, Sebastian
Grafik Top
Shortfacts
Category
Paper in Conference Proceedings or in Workshop Proceedings (Paper)
Event Title
The Conference on Uncertainty in Artificial Intelligence (UAI)
Divisions
Data Mining and Machine Learning
Event Location
Rio de Janeiro, Brazil
Event Type
Conference
Event Dates
21.07.-25.07.2025
Date
21 July 2025
Export
Grafik Top