On Constant Regret for Low-Rank MDPs
Abstract
Although there exist instance-dependent regret bounds for linear Markov decision processes (MDPs) and low-rank bandits, extensions to low-rank MDPs remain unexplored. In this work, we close this gap and provide regret bounds for low-rank MDPs in an instance-dependent setting. Specifically, we introduce an algorithm, called UNISREP-UCB, which utilizes a constrained optimization objective to learn features with good spectral properties. Furthermore, we demonstrate that our algorithm enjoys constant regret if the minimal sub-optimality gap and the occupancy distribution of the optimal policy are well-defined and known. To the best of our knowledge, these are the first instance-dependent regret results for low-rank MDPs.

- Sturm, Alexander
- Tschiatschek, Sebastian

Shortfacts
Category |
Paper in Conference Proceedings or in Workshop Proceedings (Paper) |
Event Title |
The Conference on Uncertainty in Artificial Intelligence (UAI) |
Divisions |
Data Mining and Machine Learning |
Event Location |
Rio de Janeiro, Brazil |
Event Type |
Conference |
Event Dates |
21.07.-25.07.2025 |
Date |
21 July 2025 |
Export |
