Rethinking open source generative AI: open washing and the EU AI Act (dl.acm.org)

submitted 5 months ago by Sal@mander.xyz to c/opensource@lemmy.ml

8 comments fedilink hide all child comments

Cross-posting to the OpenSource community as I think this topic will also be of interest here.

This is an analysis of how "open" different open source AI systems are. I am also posting the two figures from the paper that summarize this information below.

ABSTRACT

The past year has seen a steep rise in generative AI systems that claim to be open. But how open are they really? The question of what counts as open source in generative AI is poised to take on particular importance in light of the upcoming EU AI Act that regulates open source systems differently, creating an urgent need for practical openness assessment. Here we use an evidence-based framework that distinguishes 14 dimensions of openness, from training datasets to scientific and technical documentation and from licensing to access methods. Surveying over 45 generative AI systems (both text and text-to-image), we find that while the term open source is widely used, many models are ‘open weight’ at best and many providers seek to evade scientific, legal and regulatory scrutiny by withholding information on training and fine-tuning data. We argue that openness in generative AI is necessarily composite (consisting of multiple elements) and gradient (coming in degrees), and point out the risk of relying on single features like access or licensing to declare models open or not. Evidence-based openness assessment can help foster a generative AI landscape in which models can be effectively regulated, model providers can be held accountable, scientists can scrutinise generative AI, and end users can make informed decisions.

Figure 2 (click to enlarge): Openness of 40 text generators described as open, with OpenAI’s ChatGPT (bottom) as closed reference point. Every cell records a three-level openness judgement (✓ open, ∼ partial or ✗ closed). The table is sorted by cumulative openness, where ✓ is 1, ∼ is 0.5 and ✗ is 0 points. RL may refer to RLHF or other forms of fine-tuning aimed at fostering instruction-following behaviour. For the latest updates see: https://opening-up-chatgpt.github.io

Figure 3 (click to enlarge): Overview of 6 text-to-image systems described as open, with OpenAI's DALL-E as a reference point. Every cell records a three-level openness judgement (✓ open, ∼ partial or ✗ closed). The table is sorted by cumulative openness, where ✓ is 1, ∼ is 0.5 and ✗ is 0 points.

There is also a related Nature news article: Not all ‘open source’ AI models are actually open: here’s a ranking

PDF Link: https://dl.acm.org/doi/pdf/10.1145/3630106.3659005

you are viewing a single comment's thread
view the rest of the comments

[-] lily33@lemm.ee 3 points 4 months ago* (last edited 4 months ago)

A bunch of these columns are outright absurd TBH, to the extend I'm not sure the author really knows what FOSS is about. What's open API access even supposed to be - API access is closed by definition.

Also there has never been a requirement that open source software needs to be documented - and for good reason - so I'm not a fan of the documentation column as well.

[-] princessnorah@lemmy.blahaj.zone 3 points 4 months ago* (last edited 4 months ago)

and for good reason

I'd love to hear that reasoning. Personally, I will avoid using a FOSS product if the documentation is terrible or non-existent. Obviously I have grace for new* or bleeding-edge projects. But I've avoided using some FOSS stalwarts simply because I don't have the time to dedicate to trial and error learning.

[-] lily33@lemm.ee 3 points 4 months ago

Because FOSS shouldn't add burdens. You publish your work and let everyone else use it. That shouldn't add extra obligations on you. Usually, you'd also write some docs - after all, without them nobody will know how to use your program, so why bother publishing - but it shouldn't be an obligation. Make it easy for people to open up their code without this attaching strings.

Documentation is nice, but it's kind of different thing that open source: a program can be open and undocumented, or closed but well documented - and I don't see why we'd want it different for models.

[-] princessnorah@lemmy.blahaj.zone 1 points 4 months ago

That's fair, thank you for explaining. I was going to say but forgot, this is assessing specifically for "openness" not 'open source-ness' though.

[-] lily33@lemm.ee 1 points 4 months ago

upcoming EU AI Act that regulates open source systems differently, creating an urgent need for practical openness assessment

So when they say "openness" they do put it in the context of open source rather accessibility.

this post was submitted on 20 Jun 2024

71 points (98.6% liked)

Open Source

31217 readers

199 users here now

All about open source! Feel free to ask questions, and share news, and interesting stuff!

Useful Links

Rules

Posts must be relevant to the open source ideology
No NSFW content
No hate speech, bigotry, etc

Related Communities

Community icon from opensource.org, but we are not affiliated with them.

founded 5 years ago

MODERATORS

Cloak@lemmy.ml

kevincox@lemmy.ml

CrypticCoffee@lemmy.ml

Lettuceeatlettuce@lemmy.ml