« Back to publications

Learning like human annotators: Cyberbullying detection in lengthy social media sessions

Peiling Yi, Arkaitz Zubiaga

The Web Conference. 2023.

Download PDF fileAccess publication
The inherent characteristic of cyberbullying of being a recurrent attitude calls for the investigation of the problem by looking at social media sessions as a whole, beyond just isolated social media posts. However, the lengthy nature of social media sessions challenges the applicability and performance of session-based cyberbullying detection models. This is especially true when one aims to use state-of-the-art Transformer-based pre-trained language models, which only take inputs of a limited length. In this paper, we address this limitation of transformer models by proposing a conceptually intuitive framework called LS-CB, which enables cyberbullying detection from lengthy social media sessions. LS-CB relies on the intuition that we can effectively aggregate the predictions made by transformer models on smaller sliding windows extracted from lengthy social media sessions, leading to an overall improved performance. Our extensive experiments with six transformer models on two session-based datasets show that LS-CB consistently outperforms three types of competitive baselines including state-of-the-art cyberbullying detection models. In addition, we conduct a set of qualitative analyses to validate the hypotheses that cyberbullying incidents can be detected through aggregated analysis of smaller chunks derived from lengthy social media sessions (H1), and that cyberbullying incidents can occur at different points of the session (H2), hence positing that frequently used text truncation strategies are suboptimal compared to relying on holistic views of sessions. Our research in turn opens an avenue for fine-grained cyberbullying detection within sessions in future work.