From 9a372e9a48b662cd2e6dfd5bc6d6f71088a31a7a Mon Sep 17 00:00:00 2001 From: KVarma <136114974+kaustubhavarma@users.noreply.github.com> Date: Thu, 9 May 2024 09:52:58 +0530 Subject: [PATCH] Create slurs-occur-every-language.mdx --- .../pages/blog/slurs-occur-every-language.mdx | 23 +++++++++++++++++++ 1 file changed, 23 insertions(+) create mode 100644 uli-website/src/pages/blog/slurs-occur-every-language.mdx diff --git a/uli-website/src/pages/blog/slurs-occur-every-language.mdx b/uli-website/src/pages/blog/slurs-occur-every-language.mdx new file mode 100644 index 00000000..f5a23068 --- /dev/null +++ b/uli-website/src/pages/blog/slurs-occur-every-language.mdx @@ -0,0 +1,23 @@ +--- +name: Online Gender-Based Slurs and Abuses Don't Just Happen in English +excerpt: "On Developing Open Source Slur Lists in Indian English, Hindi, Tamil and Malayalam" +author: "Tattle" +project: +date: 09-05-2024 +tags: +--- + +import ContentPageShell from "../../components/molecules/ContentPageShell.jsx" + + + +The majority of machine learning investment on online platforms still happens in English. +As Tarunima Prabhakar [commented to Rest of World](https://restofworld.org/2021/newsletter-south-asia-facebooks-language-problem/) back in 2021, there is hardly any prioritisation of other languages. For a country like India, abusive and hateful slurs also happen in vernacular languages. Yet, the lack of a list of these commonly used terms makes it harder to protect people from gendered abuse. + +As part of the process of building Uli, Tattle developed one of the largest crowd-sourced slur lists in Indian languages. Indian English was deemed by the team as a distinct enough vocabulary to warrant investment. Tamil and Hindi were the other two languages chosen for the initial phase given team capacity and budgets at the time. +A second round of crowdsourced words during Tattle’s 16 days of activism campaign in November 2023 allowed us to add Malayalam to the list of languages that Uli supports. +Thanks to the work led by Aakash from Citizen Digital Foundation, 50 new slurs were added in Malayalam. +It has been an immense collective effort that now contains over 630 words across the four languages, over 300 of which have been annotated. One of the annotators involved in the process also pointed out that given the number of alphabets that exist in other languages (as opposed to English), creating the lexicon for slur lists is even harder. As people learn to trick the system, the number of spelling and tweaking possibilities means consistently trying to keep up. +The slur list is available on [Tattle’s GitHub](https://restofworld.org/2021/newsletter-south-asia-facebooks-language-problem/) under the Open Data License for use by interested researchers and activists, or Trust and Safety teams. + +