Options
Mapping German Tweets to Geographic Regions
Abstract
We present a first attempt at classifying German tweets by region using only the text of the tweets. German Twitter users are largely unwilling to share geolocation data. Here, we introduce a two-step process. First, we identify regionally salient tweets by comparing them to an "average" German tweet based on lexical features. Then, regionally salient tweets are assigned to one of 7 dialectal regions. We achieve an accuracy (on regional tweets) of up to 50% on a balanced corpus, much improved from the baseline. Finally, we show several directions in which this work can be extended and improved.
Publikationstyp
ConferencePaper
Autor*in • • •
Scheffler, Tatjana
Gontrum, Johannes
Wegel, Matthhias
Wendler, Steve
Erscheinungsdatum
2014
Fachbereich
Institut / Einrichtung
Erschienen in
Workshop proceedings of the 12th edition of the KONVENS conference
Erste Seite
26
Letzte Seite
33
URN
urn:nbn:de:gbv:hil2-opus-3236
HilPub Permalink
Dateien