Frame and Segment Level Recurrent Neural Networks for Phone Classification

Frame and Segment Level Recurrent Neural Networks for Phone Classification

Abstract

We introduce a simple and efficient frame and segment levelRNN model (FS-RNN) for phone classification. It processesthe input atframe levelandsegment levelby bidirectional gatedRNNs. This type of processing is important to exploit the(temporal) information more effectively compared to(i)mod-els which solely process the input at frame level and(ii)mod-els which process the input on segment level using features ob-tained by heuristic aggregation of frame level features. Further-more, we incorporated the activations of the last hidden layerof the FS-RNN as an additional feature type in a neural higher-order CRF (NHO-CRF). In experiments, we demonstrated ex-cellent performance on the TIMIT phone classification task, re-porting a performance of13.8%phone error rate for the FS-RNN model and11.9%when combined with the NHO-CRF. Inboth cases we significantly exceeded the state-of-the-art perfor-mance.

Grafik Top
Authors
  • Ratajczak, Martin
  • Tschiatschek, Sebastian
  • Pernkopf, Franz
Grafik Top
Shortfacts
Category
Paper in Conference Proceedings or in Workshop Proceedings (Paper)
Event Title
Conference of the International Speech Communication Association (INTERSPEECH)
Divisions
Data Mining and Machine Learning
Event Location
Stockholm, Sweden
Event Type
Conference
Event Dates
20.-24.08.2017
Series Name
18th Annual Conference of the International Speech Communication Association (INTERSPEECH 2017): Situated Interaction
ISSN/ISBN
9781510848764
Page Range
pp. 1318-1322
Date
2017
Official URL
https://www.tschiatschek.net/files/ratajczak17fsrn...
Export
Grafik Top