« Back to publications

ID-XCB: Data-independent Debiasing for Fair and Accurate Transformer-based Cyberbullying Detection

Peiling Yi, Arkaitz Zubiaga

ICWSM. 2025.

Download PDF file
The use of swear words is a common proxy to collect datasets with cyberbullying incidents, which increases the chances of collecting such events that are otherwise hard to find. However, datasets collected through this means also have a risk of introducing biases in cyberbullying detection models which can learn spurious associations between swear words and the presence of incidents. In this study, we undertake a pioneering study of measuring and mitigating swearing bias in cyberbullying detection tasks. Initially, we employ word-level bias measures to demonstrate the distinctive features related to swearing biases in transformer-based cyberbullying detection models. Subsequently, we introduce ID-XCB, the first data-independent debiasing technique that combines adversarial training, bias constraints and a debias fine-tuning approach aimed at alleviating model attention to bias-inducing words without impacting overall model performance. Lastly, we explore ID-XCB on two popular session-based cyberbullying detection datasets along with a comprehensive set of ablation studies and model generalisation studies. Our findings show that ID-XCB learns robust cyberbullying detection capabilities while mitigating biases tied to swear word usage. It consistently outperforms state-of-the-art debiasing methods in terms of both performance improvement and bias mitigation. In addition, by combining quantitative and qualitative analyses, we demonstrate the potential for generalisability of our approach when tackling unseen data.