Frame and Segment Level Recurrent Neural Networks for Phone Classification
We introduce a simple and efficient frame and segment levelRNN model (FS-RNN) for phone classification. It processesthe input atframe levelandsegment levelby bidirectional gatedRNNs. This type of processing is important to exploit the(temporal) information more effectively compared to(i)mod-els which solely process the input at frame level and(ii)mod-els which process the input on segment level using features ob-tained by heuristic aggregation of frame level features. Further-more, we incorporated the activations of the last hidden layerof the FS-RNN as an additional feature type in a neural higher-order CRF (NHO-CRF). In experiments, we demonstrated ex-cellent performance on the TIMIT phone classification task, re-porting a performance of13.8%phone error rate for the FS-RNN model and11.9%when combined with the NHO-CRF. Inboth cases we significantly exceeded the state-of-the-art perfor-mance.
Top- Ratajczak, Martin
- Tschiatschek, Sebastian
- Pernkopf, Franz
Category |
Paper in Conference Proceedings or in Workshop Proceedings (Paper) |
Event Title |
Conference of the International Speech Communication Association (INTERSPEECH) |
Divisions |
Data Mining and Machine Learning |
Event Location |
Stockholm, Sweden |
Event Type |
Conference |
Event Dates |
20.-24.08.2017 |
Series Name |
18th Annual Conference of the International Speech Communication Association (INTERSPEECH 2017): Situated Interaction |
ISSN/ISBN |
9781510848764 |
Page Range |
pp. 1318-1322 |
Date |
2017 |
Official URL |
https://www.tschiatschek.net/files/ratajczak17fsrn... |
Export |