AlbNER: A Corpus for Named Entity Recognition in Albanian

Content

Abstract
Authors
Shortfacts

Abstract

Scarcity of resources such as annotated text corpora for under-resourced languages like Albanian is a serious impediment in computational linguistics and natural language processing research. This paper presents AlbNER, a corpus of 900 sentences with labeled named entities, collected from Albanian Wikipedia articles. Preliminary results with BERT and RoBERTa variants fine-tuned and tested with AlbNER data indicate that model size has slight impact on NER performance, whereas language transfer has a significant one. AlbNER corpus and these obtained results should serve as baselines for future experiments.

Top

Authors

Çano, Erion

Top

Shortfacts

Category	Technical Report (Discussion Paper)
Divisions	Data Mining and Machine Learning
Subjects	Kuenstliche Intelligenz Sprachverarbeitung
Date	15 September 2023
Export

Top

CS is powered by EPrints 3 which is developed by the School of Electronics and Computer Science at the University of Southampton. More information and software credits.