From e7aff103a6c4267f882a4a1fa6757b5c51e8d069 Mon Sep 17 00:00:00 2001 From: KVarma <136114974+kaustubhavarma@users.noreply.github.com> Date: Thu, 5 Dec 2024 11:55:06 +0530 Subject: [PATCH] feat: add blog 2024-12-05-tattle-mlcommons (#210) --- src/blog/2024-12-05-tattle-mlcommons.mdx | 34 ++++++++++++++++++++++++ 1 file changed, 34 insertions(+) create mode 100644 src/blog/2024-12-05-tattle-mlcommons.mdx diff --git a/src/blog/2024-12-05-tattle-mlcommons.mdx b/src/blog/2024-12-05-tattle-mlcommons.mdx new file mode 100644 index 0000000..15a05b2 --- /dev/null +++ b/src/blog/2024-12-05-tattle-mlcommons.mdx @@ -0,0 +1,34 @@ +--- +name: Launch of V1.0 AI Luminate, and how Tattle is involved +excerpt: Tattle's work with ML Commons on AI Safety +author: Tattle +project: "" +date: 2024-12-05 +tags: responsible-ai +--- +## Launch of V1.0 AI Luminate, and how Tattle is involved + +Earlier this year, ML Commons, a global organisation which works to improve AI systems issued an expression of interest for creating prompts in non-English languages. +Tattle was selected as a pilot project to contribute to the benchmark in Hindi, using the participatory approach we followed with Uli [^1], and we commenced working on this project. +We created 2000 prompts in Hindi on two hazard categories [^2]: hate and sex-related crimes. +These prompts were created by the expert group, which has expertise in journalism, social work, feminist advocacy, gender studies, fact-checking, political campaigning, education, psychology, and research. All of the experts were native or fluent Hindi speakers. + +The project took place over the course of 2 months, where we conducted online sessions with the experts organised into groups. +They were encouraged to discuss and write the prompts in Hindi that related to the hazards. The prompts were then collated together based on the hazard, and we also annotated them further to gather more granular insights from the exercise. +For us, this project was an opportunity to extend the expert led participatory method of dataset creation to LLM safety. + +MLCommons is now releasing the v1 Safety Benchmark dataset, AI Luminate. It is an important step in assessing the safety of LLMs. +Our project provided interesting insights on the universality of the framework proposed in v0.5. +We conclude our report, available [here](https://mlcommons.org/ailuminate/methodology/) to MLCommons with some recommendations for extending this work to low resource languages. +In addition to contributing to AI Luminate, we also engaged in an extensive landscape analysis of large language models and their coverage of Indian languages. +In the study, we looked at existing evaluation datasets and methodologies used to assess the performance of LLMs across various language tasks. +For a set of models that support Indian languages, we also analyzed attributes such as the training data, the distribution of Indian languages within it, access, licensing, and the types of LLM. + +Take a look at AI Luminate [here](https://mlcommons.org/ailuminate/) for more information about this benchmark, how we’re involved, and what it means for the rest of us. + +[^1]: https://aclanthology.org/2024.woah-1.16/ +[^2]: https://arxiv.org/html/2404.12241v1 + + + +