Counterspeech, the practice of responding to harmful or hateful content online, is one of the most debated strategies for reducing toxicity on social media. But does it work? And if so, under what conditions?
Racial slurs, blatant attacks, but also subtle everyday bigotry. Navigating the web has increasingly become an obstacle course of hate, prejudice, social exclusion and discrimination — in short, toxic speech, to which billions of users are exposed every day. Often subtle and easily overlooked, toxic speech is pervasive on social media platforms. Over time, it can undermine democratic participation, civil freedoms and human dignity, particularly among already discriminated and underrepresented groups.
«Words are more than mere vehicles of information. They spread values that affect our vision of the world. Words can have the power to weaken our confidence, dignity and emotional world», says Bianca Cepollaro, Associate Professor in Philosophy of Language at Vita-Salute San Raffaele University (UniSR).
With a PhD in Linguistics and Philosophy obtained jointly from the University of Pisa and the École Normale Supérieure in Paris, Professor Cepollaro is among the recipients of the Fondo Italiano per la Scienza (FIS) grant, the funding programme of the Italian Ministry of University and Research that supports curiosity-driven research, drawing inspiration from the European ERC funding scheme.
Her project, ACTION (Advancing Counterspeech against Toxic Interactions Online), brings together Philosophy of Language, Moral and Political Philosophy, Computer Science and Social Psychology to outline an evidence-based model for counterspeech: strategies designed to react effectively to online toxic language.
«We already have counterspeech recommendations from various institutions, activists, and NGOs. But even if these guidelines are very valuable, sensitive and well-intended, their effectiveness in weakening toxic speech has not been empirically tested», explains Professor Cepollaro.
While many researchers have pointed out various shortcomings of censorship, defining a “good” counterspeech intervention is problematic as well. Desired outcomes, such as changing the toxic speaker’s mind, supporting the victim or protecting bystanders, require different interventions, depending on context, personal experience and circumstances.
A preliminary finding from the ACTION project illustrates the gap between intuition and evidence. Common sense may suggest that responding with humour is an effective and light-hearted way of countering bigotry. Yet an analysis of a small Twitter corpus of 1,000 toxic speech/counterspeech pairs suggests that the use of irony in counterspeech increases the perception of hostility, potentially enhancing polarisation and negative reactions.
«This raises two crucial points: how do we tailor counterspeech to different needs, contexts and situations, since common sense is not necessarily the best guide? And how do we test the effectiveness of counterspeech in each instance? To answer these questions, we first need to establish a clear taxonomy of both toxic discourse and counterspeech strategies, while defining the broader notion of toxicity», specifies Prof. Cepollaro.
The ACTION project tackles the problem in three steps, each combining different disciplines. First, the team will merge the tools of philosophy of language and computer science to collect, annotate and analyse large, real-world datasets of toxic discourse and responses in both Italian and English from social media platforms such as Facebook and X, where language strongly shapes implicit social norms. Philosophical ethics and political theory will then provide the normative framework: where, when and how to intervene. The result will be a preliminary model that specifies optimal counterspeech responses based on the type of toxic discourse encountered.
Second, the model will be validated through two sets of studies in psychology and computer science. Behavioural experiments drawn from social psychology will assess how the proposed toolkit affects individual attitudes, empathy and prejudice. Experiments in computer science will track how online discussions evolve after intervention on real social threads, measuring effectiveness at the collective level.
«Basically, we want to offer people an empirically validated toolkit of words, sentences, discourses to react effectively to online toxic speech», explains Prof. Cepollaro. The third and final step is the generation of open-source policies and guidelines based on the validated model, made available for practical application.
«In the end, with ACTION we aim to move the fight against toxic speech from good intentions to solid evidence, by rigorously testing theoretical strategies against real psychological and social outcomes. We basically hope to make anyone, institutions, communicators, and citizens included, live a healthier internet place and experience», concludes Professor Cepollaro.
The ambition of the project lies in the integration of disciplines that rarely work together at this level: philosophy of language provides the analytical framework to classify how toxic speech operates; computer science supplies the tools to process large-scale data from real platforms; social psychology tests whether proposed interventions actually change attitudes and behaviour. If the model holds, it could inform platform policies, educational programs and civic initiatives with something that most current counterspeech guidelines lack: empirical validation.