Options
Improving the Performance of Standard Part-of-Speech Taggers for Computer-Mediated Communication
Abstract
We assess the performance of off-the-shelve POS taggers when applied to two types of Internet texts in German, and investigate easy-to-implement methods to improve tagger performance. Our main findings are that extending a standard training set with small amounts of manually annotated data for Internet texts leads to a substantial improvement of tagger performance, which can be further improved by using a previously proposed method to automatically acquire training data. As a prerequisite for the evaluation, we create a manually annotated corpus of Internet forum and chat texts.
Publikationstyp
ConferencePaper
Autor*in
Erscheinungsdatum
2014
Fachbereich
Institut / Einrichtung
Erschienen in
Proceedings of the 12th edition of the KONVENS conference
Erste Seite
171
Letzte Seite
177
URN
urn:nbn:de:gbv:hil2-opus-2792
HilPub Permalink
Dateienp027.pdf (117.59 KB)
Main Conference Proceedings of the 12th Konvens 2014