ML Pipeline Insights Service for Rule-Based Assessment of Training Practices in Reinforcement Learning

Content

Abstract
Authors
Shortfacts

Abstract

As artificial intelligence continues to advance, Reinforcement Learning (RL) has established itself as a core approach for developing intelligent agents that make decisions over time. As RL systems grow in complexity, the need for standardized training practices becomes critical. This paper introduces a rule-based assessment approach to enforce best practices in RL training. We define a comprehensive set of architectural rules focused on RL pipeline practices, models versioning, multi-agents deployment and managing models in inference. Our methodology integrates Large Language Models (LLMs) and custom-based code detectors to ensure compliance with these best practices across diverse RL systems. We developed a \textit{ML pipeline insights service} to automatically validate RL training practices directly from the source code. We validate our approach by applying it in a large-scale industrial case study and sixteen open-source case studies. Our evaluation showed that custom-based detectors achieved near-perfect precision and recall F_1 = 0.98, while LLM-based detectors provided scalable validation with moderate F_1 scores (0.67--0.71), demonstrating the hybrid approach’s strength in balancing accuracy and automation. The results demonstrate our tool's accuracy in identifying and enforcing best practices with high precision and recall rates, highlighting its practical applicability and automation feasibility.

Top

Authors

Ntentos, Evangelos
Urdih, Francesco
Zdun, Uwe

Top

Shortfacts

Category	Paper in Conference Proceedings or in Workshop Proceedings (Paper)
Event Title	51th Euromicro Conference Series on Software Engineering and Advanced Applications (SEAA)
Divisions	Software Architecture
Subjects	Software Engineering
Event Location	Salerno, Italy
Event Type	Conference
Event Dates	10 -12 September 2025
Date	10 September 2025
Export

Top