Now showing 1 - 10 of 23
  • Publication
    Metadata only
    KI-Verfahren für die Hate Speech Erkennung: Die Gestaltung von Ressourcen für das maschinelle Lernen und ihre Zuverlässigkeit
    (J.B. Metzler, 2023-05) ; ;
    Steiger, Stefan
    Die Erkennung von Hate Speech durch KI erfordert umfangreiche Trainingsdaten. Die Zusammenstellung dieser Trainingsmenge entscheidet über die Leistungsfähigkeit der Systeme, denn es können nur Hassbotschaften erkannt werden, die den Trainingsdaten ähnlich sind. Zunächst werden einige der bestehenden Benchmarks und die Entwicklungen bei deren Aufbau besprochen. Anschließend diskutiert der Artikel mögliche Verzerrungen und die Ansätze für deren Messung. Auch der Vergleich über mehrere Kollektionen und das Schaffen von Transparenz können die Wirksamkeit von Trainingsdaten verdeutlichen.
      6
  • Publication
    Metadata only
    Interdisciplinary Analysis of Science Communication on Social Media during the COVID-19 Crisis
    In times of crisis, science communication needs to be accessible and convincing. In order to understand whether these two criteria apply to concrete science communication formats, it is not enough to merely study the communication product. Instead, the recipient’s perspective also needs to be taken into account. What do recipients value in popular science communication formats concerning COVID-19? What do they criticize? What elements in the formats do they pay attention to? These questions can be answered by reception studies, for example, by analyzing the reactions and comments of social media users. This is particularly relevant since scientific information was increasingly disseminated over social media channels during the COVID-19 crisis. This interdisciplinary study, therefore, focuses both on science communication strategies in media formats and the related comments on social media. First, we selected science communication channels on YouTube and performed a qualitative multi-modal analysis. Second, the comments responding to science communication content online were analyzed by identifying Twitter users who are doctors, researchers, science communicators and those who represent research institutes and then, subsequently, performing topic modeling on the textual data. The main goal was to find topics that directly related to science communication strategies. The qualitative video analysis revealed, for example, a range of strategies for accessible communication and maintaining transparency about scientific insecurities. The quantitative Twitter analysis showed that few tweets commented on aspects of the communication strategies. These were mainly positive while the sentiment in the overall collection was less positive. We downloaded and processed replies for 20 months, starting at the beginning of the pandemic, which resulted in a collection of approximately one million tweets from the German science communication market.
      7
  • Publication
    Metadata only
    Emma stop that, it's my turn now - Comparing Peer Tutoring and Thinking Aloud for Usability-Testing with Children in a school setting
    (ACM, 2023) ; ;
    Lorberg, Kira 
    The subject of this study was to explore children's ability to offer verbal feedback during usability evaluation studies. The aim is to find out whether the use of the method Peer Tutoring or Thinking Aloud can identify more usability findings in usability tests with second graders than observation. 13 Second graders tested an interactive game using two evaluation techniques. The findings indicate that the majority of verbal remarks were identified with the method of Thinking Aloud and that participants also provided more higher quality remarks. More usability findings could be identified than in a purely observational situation. Unexpectedly, the Peer Tutoring method was less beneficial for the identification of usability problems since the participants struggled to cooperate successfully.
      10
  • Publication
    Metadata only
    Detecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments
    (2023) ;
    Madhu, Hiren 
    ;
    Satapara, Shrey 
    ;
    Modha, Sandip 
    ;
    Majumder, Prasenjit 
    The spread of Hate Speech on online platforms is a severe issue for societies and requires the identification of offensive content by platforms. Research has modeled Hate Speech recognition as a text classification problem that predicts the class of a message based on the text of the message only. However, context plays a huge role in communication. In particular, for short messages, the text of the preceding tweets can completely change the interpretation of a message within a discourse. This work extends previous efforts to classify Hate Speech by considering the current and previous tweets jointly. In particular, we introduce a clearly defined way of extracting context. We present the development of the first dataset for conversational-based Hate Speech classification with an approach for collecting context from long conversations for code-mixed Hindi (ICHCL dataset). Overall, our benchmark experiments show that the inclusion of context can improve classification performance over a baseline. Furthermore, we develop a novel processing pipeline for processing the context. The best-performing pipeline uses a fine-tuned SentBERT paired with an LSTM as a classifier. This pipeline achieves a macro F1 score of 0.892 on the ICHCL test dataset. Another KNN, SentBERT, and ABC weighting-based pipeline yields an F1 Macro of 0.807, which gives the best results among traditional classifiers. So even a KNN model gives better results with an optimized BERT than a vanilla BERT model.
      8
  • Publication
    Metadata only
    Reactions to science communication: discovering social network topics using word embeddings and semantic knowledge
    (2023)
    Cerqueira de Lima, Bernardo 
    ;
    Abrantes Baracho, Renata Maria 
    ;
    ;
    Baracho Porto, Patricia 
    Social media platforms that disseminate scientific information to the public during the COVID-19 pandemic highlighted the importance of the topic of scientific communication. Content creators in the field, as well as researchers who study the impact of scientific information online, are interested in how people react to these information resources. This study aims to devise a framework that can sift through large social media datasets and find specific feedback to content delivery, enabling scientific content creators to gain insights into how the public perceives scientific information, and how their behavior toward science communication (e.g., through videos or texts) is related to their information-seeking behavior. To collect public reactions to scientific information, the study focused on Twitter users who are doctors, researchers, science communicators, or representatives of research institutes, and processed their replies for two years from the start of the pandemic. The study aimed in developing a solution powered by topic modeling enhanced by manual validation and other machine learning techniques, such as word embeddings, that is capable of filtering massive social media datasets in search of documents related to reactions to scientific communication. The architecture developed in this paper can be replicated for finding any documents related to niche topics in social media data.
      6
  • Publication
    Metadata only
    University of Hildesheim at SemEval-2023 Task 1: Combining Pre-trained Multimodal and Generative Models for Image Disambiguation
    Multimodal ambiguity is a challenge for understanding text and images. Large pre-trained models have reached a high level of quality already. This paper presents an implementation for solving a image disambiguation task relying solely on the knowledge captured in multimodal and language models. Within the task 1 of SemEval 2023 (Visual Word Sense Disambiguation), this approach managed to achieve an MRR of 0.738 using CLIP-Large and the OPT model for generating text. Applying a generative model to create more text given a phrase with an ambiguous word leads to an improvement of our results. The performance gain from a bigger language model is larger than the performance gain from using the lager CLIP model.
      13
  • Publication
    Metadata only
    B 16 Text Mining und Data Mining
    (De Gruyter, 2022-11-21) ;
    Kuhlen, Rainer
    ;
    Lewandowski, Dirk
    ;
    Semar, Wolfgang
    ;
      5
  • Publication
    Metadata only
    Enabling Informational Autonomy through Explanation of Content Moderation: UI Design for Hate Speech Detection
    (Gesellschaft für Informatik e.V., 2022-09-04)
    Sontheimer, Lukas 
    ;
    ;
    Content moderation using AI and in particular Hate Speech detection has been a research topic with a focus on natural language processing, classification algorithms and data benchmarks. Less attention has been dedicated to how the classification systems are later integrated into tools which support users in an application task. In this paper we review existing tools and prototypes. Furthermore, we design and implement an online user interface for explainability. The system is connected to a neural network classifier based on the HASOC benchmark. The interface allows users to enter messages, observe classification decisions and see similar messages for explanation. It provides support for users of social media who are interested in the performance of AI systems for content moderation and who want to observe the performance of hate speech detection tools. A qualitative evaluation with experts showed that our system can be helpful to bridge the gap between humans and AI.
      7
  • Publication
    Metadata only
    Deep learning for historical books: classification of printing technology for digitized images
    (2022-02) ;
    Kim, Yongho 
    ;
    Printing technology has evolved through the past centuries due to technological progress. Within Digital Humanities, images are playing a more prominent role in research. For mass analysis of digitized historical images, bias can be introduced in various ways. One of them is the printing technology originally used. The classification of images to their printing technology e.g. woodcut, copper engraving, or lithography requires highly skilled experts. We have developed a deep learning classification system that achieves very good results. This paper explains the challenges of digitized collections for this task. To overcome them and to achieve good performance, shallow networks and appropriate sampling strategies needed to be combined. We also show how class activation maps (CAM) can be used to analyze the results.
      9