AlbNER: A Corpus for Named Entity Recognition in Albanian

AlbNER: A Corpus for Named Entity Recognition in Albanian

Abstract

Scarcity of resources such as annotated text corpora for under-resourced languages like Albanian is a serious impediment in computational linguistics and natural language processing research. This paper presents AlbNER, a corpus of 900 sentences with labeled named entities, collected from Albanian Wikipedia articles. Preliminary results with BERT and RoBERTa variants fine-tuned and tested with AlbNER data indicate that model size has slight impact on NER performance, whereas language transfer has a significant one. AlbNER corpus and these obtained results should serve as baselines for future experiments.

Grafik Top
Authors
  • Çano, Erion
Grafik Top
Shortfacts
Category
Technical Report (Discussion Paper)
Divisions
Data Mining and Machine Learning
Subjects
Kuenstliche Intelligenz
Sprachverarbeitung
Date
15 September 2023
Export
Grafik Top