Institut für Informationswissenschaft und Sprachtechnologie
Now showing 1 - 10 of 209
- PublicationMetadata onlyKI-Verfahren für die Hate Speech Erkennung: Die Gestaltung von Ressourcen für das maschinelle Lernen und ihre ZuverlässigkeitDie Erkennung von Hate Speech durch KI erfordert umfangreiche Trainingsdaten. Die Zusammenstellung dieser Trainingsmenge entscheidet über die Leistungsfähigkeit der Systeme, denn es können nur Hassbotschaften erkannt werden, die den Trainingsdaten ähnlich sind. Zunächst werden einige der bestehenden Benchmarks und die Entwicklungen bei deren Aufbau besprochen. Anschließend diskutiert der Artikel mögliche Verzerrungen und die Ansätze für deren Messung. Auch der Vergleich über mehrere Kollektionen und das Schaffen von Transparenz können die Wirksamkeit von Trainingsdaten verdeutlichen.
- PublicationMetadata onlyInterdisciplinary Analysis of Science Communication on Social Media during the COVID-19 CrisisIn times of crisis, science communication needs to be accessible and convincing. In order to understand whether these two criteria apply to concrete science communication formats, it is not enough to merely study the communication product. Instead, the recipient’s perspective also needs to be taken into account. What do recipients value in popular science communication formats concerning COVID-19? What do they criticize? What elements in the formats do they pay attention to? These questions can be answered by reception studies, for example, by analyzing the reactions and comments of social media users. This is particularly relevant since scientific information was increasingly disseminated over social media channels during the COVID-19 crisis. This interdisciplinary study, therefore, focuses both on science communication strategies in media formats and the related comments on social media. First, we selected science communication channels on YouTube and performed a qualitative multi-modal analysis. Second, the comments responding to science communication content online were analyzed by identifying Twitter users who are doctors, researchers, science communicators and those who represent research institutes and then, subsequently, performing topic modeling on the textual data. The main goal was to find topics that directly related to science communication strategies. The qualitative video analysis revealed, for example, a range of strategies for accessible communication and maintaining transparency about scientific insecurities. The quantitative Twitter analysis showed that few tweets commented on aspects of the communication strategies. These were mainly positive while the sentiment in the overall collection was less positive. We downloaded and processed replies for 20 months, starting at the beginning of the pandemic, which resulted in a collection of approximately one million tweets from the German science communication market.
- PublicationMetadata onlySpeech and Language Technologies for Low-Resource Languages(Springer International Publishing, 2023)
;M, Anand Kumar ;Chakravarthi, Bharathi Raja ;B, Bharathi ;O’Riordan, Colm ;Murthy, Hema ;Durairaj, Thenmozhi 8
- PublicationMetadata onlyEmma stop that, it's my turn now - Comparing Peer Tutoring and Thinking Aloud for Usability-Testing with Children in a school settingThe subject of this study was to explore children's ability to offer verbal feedback during usability evaluation studies. The aim is to find out whether the use of the method Peer Tutoring or Thinking Aloud can identify more usability findings in usability tests with second graders than observation. 13 Second graders tested an interactive game using two evaluation techniques. The findings indicate that the majority of verbal remarks were identified with the method of Thinking Aloud and that participants also provided more higher quality remarks. More usability findings could be identified than in a purely observational situation. Unexpectedly, the Peer Tutoring method was less beneficial for the identification of usability problems since the participants struggled to cooperate successfully.
- PublicationMetadata onlyUniversity of Hildesheim at SemEval-2023 Task 1: Combining Pre-trained Multimodal and Generative Models for Image DisambiguationMultimodal ambiguity is a challenge for understanding text and images. Large pre-trained models have reached a high level of quality already. This paper presents an implementation for solving a image disambiguation task relying solely on the knowledge captured in multimodal and language models. Within the task 1 of SemEval 2023 (Visual Word Sense Disambiguation), this approach managed to achieve an MRR of 0.738 using CLIP-Large and the OPT model for generating text. Applying a generative model to create more text given a phrase with an ambiguous word leads to an improvement of our results. The performance gain from a bigger language model is larger than the performance gain from using the lager CLIP model.
- PublicationMetadata onlyProfessor’s and student’s perspectives on digital education during the COVID-19 pandemic in Germany: Online teaching, adaptation of courses and OER(2023)
;Lea Wöbbekind ;Leonie Voland ;Orhan Yener ;Juan-José Boté-Vericad ;Sílvia Argudo ;Cristóbal UrbanoThomas MandlOpen educational resources (OER) and digital education (DE) have shown the ability to improve teaching and learning possibilities, particularly in light of unpredictably occurring events. Especially the COVID-19 pandemic revealed that universities were experiencing technological, socio-psychological, and didactic issues. In order to promote, enrich, and improve DE and OER for crises and beyond, this research article addresses specifically the target audiences of students and teachers in Library and Information Science (LIS) programs in Germany. A qualitative approach with interviews and focus groups was applied to identify, analyze and compare students’ and professors’ attitudes, experiences and problems in remote teaching and learning during a crisis. The results showed that LIS professors from our sample are experienced and innovative regarding the use of DE during a period of crisis. However, diverse obstacles for the use and production of OER for online education become visible. Students’ first difficulties with online learning could be resolved and show how quickly they were able to adjust to the new teaching environment. Both LIS professors and students recognize the advantages of employing DE and OER in higher education. They emphasize positive learning experiences based on flexibility when integrating DE and OER in LIS programs. 2
- PublicationMetadata onlyDetecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments(2023)
; ;Madhu, Hiren ;Satapara, Shrey ;Modha, SandipMajumder, PrasenjitThe spread of Hate Speech on online platforms is a severe issue for societies and requires the identification of offensive content by platforms. Research has modeled Hate Speech recognition as a text classification problem that predicts the class of a message based on the text of the message only. However, context plays a huge role in communication. In particular, for short messages, the text of the preceding tweets can completely change the interpretation of a message within a discourse. This work extends previous efforts to classify Hate Speech by considering the current and previous tweets jointly. In particular, we introduce a clearly defined way of extracting context. We present the development of the first dataset for conversational-based Hate Speech classification with an approach for collecting context from long conversations for code-mixed Hindi (ICHCL dataset). Overall, our benchmark experiments show that the inclusion of context can improve classification performance over a baseline. Furthermore, we develop a novel processing pipeline for processing the context. The best-performing pipeline uses a fine-tuned SentBERT paired with an LSTM as a classifier. This pipeline achieves a macro F1 score of 0.892 on the ICHCL test dataset. Another KNN, SentBERT, and ABC weighting-based pipeline yields an F1 Macro of 0.807, which gives the best results among traditional classifiers. So even a KNN model gives better results with an optimized BERT than a vanilla BERT model. 6