On Constant Regret for Low-Rank MDPs

Content

Abstract
Authors
Shortfacts

Abstract

Although there exist instance-dependent regret bounds for linear Markov decision processes (MDPs) and low-rank bandits, extensions to low-rank MDPs remain unexplored. In this work, we close this gap and provide regret bounds for low-rank MDPs in an instance-dependent setting. Specifically, we introduce an algorithm, called UNISREP-UCB, which utilizes a constrained optimization objective to learn features with good spectral properties. Furthermore, we demonstrate that our algorithm enjoys constant regret if the minimal sub-optimality gap and the occupancy distribution of the optimal policy are well-defined and known. To the best of our knowledge, these are the first instance-dependent regret results for low-rank MDPs.

Top

Authors

Sturm, Alexander
Tschiatschek, Sebastian

Top

Shortfacts

Category	Paper in Conference Proceedings or in Workshop Proceedings (Paper)
Event Title	The Conference on Uncertainty in Artificial Intelligence (UAI)
Divisions	Data Mining and Machine Learning
Event Location	Rio de Janeiro, Brazil
Event Type	Conference
Event Dates	21.07.-25.07.2025
Date	21 July 2025
Export

Top

CS is powered by EPrints 3 which is developed by the School of Electronics and Computer Science at the University of Southampton. More information and software credits.