Skip to content

Commit

Permalink
Update onosaid.mdx
Browse files Browse the repository at this point in the history
  • Loading branch information
tarunima authored Nov 5, 2024
1 parent d8ba7c2 commit 4b31e1c
Showing 1 changed file with 11 additions and 5 deletions.
16 changes: 11 additions & 5 deletions src/blog/onosaid.mdx
Original file line number Diff line number Diff line change
@@ -1,20 +1,26 @@
## On the OSAID Definition

---
name: On the OSAID Definition
excerpt: Some reflections on the Open Source AI definition
date: 2024-11-05
tags: open source, responsible AI
author: Tarunima
---

I'll start with some belated thoughts on the [OSAID definition](https://opensource.org/ai/open-source-ai-definition) that was released last week. At IndiaFOSS in September this year, I spoke about the OSAID and urged people to give feedback on it. I was concerned that people weren't following a very important development closely enough. A version of the talk was [published on the OSI blog](https://opensource.org/blog/data-transparency-in-open-source-ai-protecting-sensitive-datasets) that sparked some discussion on the [OSI forum](https://discuss.opensource.org/t/data-transparency-in-open-source-ai-protecting-sensitive-datasets/588 "Data Transparency in Open Source AI: Protecting Sensitive Datasets"), but this wasn't a conversation that I wanted to casually dip into. I am sharing here, some more considered reflections.

Some background- Tattle's work is open source and has always had a machine learning (AI) component to it. While the open source AI rhetoric reached a crescendo with the release of large language models, it has been background chatter as far back as I can remember. ML developers thinking about licenses have had to do the math of openness based on the components they've used. Questions we've asked ourselves in context of [Feluda](https://github.com/tattle-made/feluda)- is it open source if it relies on Resnet? BERT? What if we use Google Cloud Vision API as one layer of data processing? These were also questions that we had to answer when submitting Feluda to the [DPGA registry](https://www.digitalpublicgoods.net/submission-guide "Submission Guide » Digital Public Goods Alliance"). And then again, [for Uli](https://github.com/tattle-made/Uli "GitHub - tattle-made/Uli: Software and Resources for Mitigating Online Gender Based Violence in India").

I am not privy to the events that led OSI to start the consultation process. Some blogs implied that it was Meta's misuse of calling Llama Open Source. But from my standpoint, there has been plenty of small scale (mis)use of the open source language in AI predating the genAI boom. It just didn't hit the media scrutiny and public debate scale. The curse of the success of open source is that it has become a common noun to refer to a whole range of things that it wasn't originally intended for. 'Open source → good' has been used as discursive technique to sidestep constitutional oversight of public infrastructure in India. And perhaps that makes me more sensitive to the misuse of the term open source. Even if I didn't want to call out other projects for what I saw was wrong use, I surely didn't want Tattle to lower the signal-to-noise ratio, by calling something open source when it wasn't clear what it meant. I have welcomed clarity on what open source AI means.
I am not privy to the events that led OSI to start the consultation process. Some blogs implied that it was Meta's misuse of calling Llama Open Source. But from my perspective, there has been plenty of small scale (mis)use of the open source language in AI predating the genAI boom. It just didn't hit the media scrutiny and public debate scale. The curse of the success of open source is that it has become a common noun to refer to a whole range of things that it wasn't originally intended for. 'Open source → good' has been used as discursive technique to sidestep constitutional oversight of public infrastructure in India. Perhaps that makes me more sensitive to the misuse of the term open source. Even if I didn't want to call out other projects for what I saw was wrong use, I surely didn't want Tattle to lower the signal-to-noise ratio, by calling something open source when it wasn't clear what it meant. I have welcomed clarity on what open source AI means.

To be clear- that the definition requires you to not open data makes me uncomfortable. I don't trust research that doesn't publish its data. It is also harder to understand research without its data. Ten minutes with a CSV dump is worth more than two hours on a dataset paper. The [aspirational position](https://sfconservancy.org/news/2024/oct/25/aspirational-on-llm-generative-ai-programming/ "SFC Announces Aspirational Statement on LLM-backed generative AI for Programming") that the software freedom conservancy put out on the use of GenAI in programming is inspiring. There is a world in which the pressure to open source everything along the AI value chain will result in more responsible data collection, and maybe even alternative models of AI development. But pragmatically, we can't reverse the last decade of AI development trajectory. The data guzzling drive is some time from abating. And we can't open data for all domains- not on individuals' reproductive health, not on individual spending patterns.
To be clear- that the definition requires developers to not open data makes me uncomfortable. I don't trust research that doesn't publish its data. It is also harder to understand research without its data. Ten minutes with a CSV dump is worth more than two hours on a dataset paper. The [aspirational position](https://sfconservancy.org/news/2024/oct/25/aspirational-on-llm-generative-ai-programming/ "SFC Announces Aspirational Statement on LLM-backed generative AI for Programming") that the Software Freedom Conservancy put out on the use of GenAI in programming is inspiring. There is a world in which the pressure to open source everything along the AI value chain will result in more responsible data collection, and maybe even alternative models of AI development. But pragmatically, we can't reverse the last decade of AI development trajectory. The data guzzling drive is some time from abating. And we can't open data for all domains- not on individuals' reproductive health, not on individual spending patterns.

The choices for an OSAID definition weren't great. Be maximalist about openness on all fronts and leave out whole range of AI applications. Compromise on openness of data and reduce the degree of four freedoms. But a definition means standing on solid ground rather than shifting sands. AI is a different technical artifact than software making it difficult to come up with a clean definition. But it derives from (open source) software innovation. Entities- and not just Meta- were/are using open source to describe their work even in absence of a definition. Open source licenses are also mental shortcuts to understand something important about a software project. Any invocation of open source in AI would confuse rather than clarify.
The choices for an OSAID definition weren't great. Be maximalist about openness on all fronts and leave out whole range of AI applications. Compromise on openness of data and reduce the degree of four freedoms. But a definition means standing on solid ground rather than shifting sands. AI is a different technical artifact than software, making it difficult to come up with a clean definition. But it derives from (open source) software innovation. Entities- and not just Meta- were using open source to describe their work, even in the absence of a definition. Open source licenses are also mental shortcuts to understand something important about a software project. But, any invocation of open source in AI would confuse rather than clarify.

The flaws of the definition aside, I am relieved that we can now (for the most part) objectively evaluate claims. Even if the OSAID definition 'fails' long term, I think the process has been a success. Here are two possible 'failure' outcomes, which to me are still good outcomes:

1. Over time we realize that the OSAID definition doesn't imply the same goodness as the original OSI license. Some other process results in another definition and over the years it gathers the same social support as the OSI software licenses definitions. It may or may not not be called the open source AI definition but, to quote Gallileo, the essence of things comes first. Names come after.

2. The rhetoric of openness in AI will lose weight. People find other/better ways of describing goodness and responsibility in AI. We all just give up on saying anything about open source in AI (and call out the ones who do). I don't think we could get to this point without having tried our hands at a definition.

For someone who wasn't in FOSS in early 2000s, it is hard to know if this process was more or less heated than the FOSS definition consultation. Reading all the blogs and forums however, has reminded me about all that I love about the FOSS community. Working on online harms means that I used to staring at the worst of human discourse. I don't take people disagreeing passionately, yet respectfully, for granted. At present, AI appears to operate under a strong centripetal force of a few large corporations. But I trust the FOSS community to passionately push for a bigger space for public interest and get us to a better definition, if this doesn't serve the purpose. For now, we're ready to work with this one.
For someone who wasn't in FOSS in early 2000s, it is hard to know if this process was more or less heated than the OSD consultation. Reading all the blogs and forums however, has reminded me about all that I love about the FOSS community. Working on online harms means that I am used to staring at the worst of human discourse. I don't take people disagreeing passionately yet respectfully, for granted. At present, AI appears to operate under a strong centripetal force of a few large corporations. But I trust the FOSS community to passionately push for a bigger space for public interest and get us to a better definition, if this doesn't serve the purpose. For now, we're ready to work with this one.

0 comments on commit 4b31e1c

Please sign in to comment.