ULF: Unsupervised Labeling Function Correction using Cross-Validation for Weak Supervision

ULF: Unsupervised Labeling Function Correction using Cross-Validation for Weak Supervision

Abstract

A cost-effective alternative to manual data labeling is weak supervision (WS), where data samples are automatically annotated using a predefined set of labeling functions (LFs), rule-based mechanisms that generate artificial labels for the associated classes. In this work, we investigate noise reduction techniques for WS based on the principle of k-fold cross-validation. We introduce a new algorithm ULF for Unsupervised Labeling Function correction, which denoises WS data by leveraging models trained on all but some LFs to identify and correct biases specific to the held-out LFs. Specifically, ULF refines the allocation of LFs to classes by re-estimating this assignment on highly reliable cross-validated samples. Evaluation on multiple datasets confirms ULF's effectiveness in enhancing WS learning without the need for manual labeling.

Grafik Top
Authors
  • Sedova, Anastasiia
  • Roth, Benjamin
Grafik Top
Shortfacts
Category
Paper in Conference Proceedings or in Workshop Proceedings (Poster)
Event Title
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Divisions
Data Mining and Machine Learning
Subjects
Kuenstliche Intelligenz
Sprachverarbeitung
Event Location
Singapore
Event Type
Conference
Event Dates
6-10 Dec 2023
Publisher
Association for Computational Linguistics
Page Range
pp. 4162-4176
Date
1 December 2024
Official URL
https://aclanthology.org/2023.emnlp-main.254
Export
Grafik Top