Technology

80635 readers

3064 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

Voxtral Transcribe 2: transcribes at the speed of sound (mistral.ai)

submitted 18 hours ago by Beep@lemmus.org to c/technology@lemmy.world

21 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] Tetsuo@jlai.lu 2 points 14 hours ago (1 children)

My understanding was that the weights were the "essence" of the training.

I think it's a bit misleading to present them as "executables".

I agree that open weights is a bit of marketing mumbo jumbo but I wouldn't say they are akin to a closed source binary.

That being said I was just reading an article about LLM sleeper agents and their trigger words... So you can hide stuff on training and it's fairly hard to spot with just the weights.

But again it's not really like a black box executable. And I've seen many great models that successfully builds on top of an open weights model.

In the end I much prefer open weights to the very popular "'''openAI""" that have opaque training and weights...

[–] XLE@piefed.social 0 points 14 hours ago (1 children)

For the purpose of simplification, calling it a closed as an executable is close enough. Or a closed-source freeware ROM that you can download and run on an emulator (since you can just download models and run them via ollama or something similar). Or a closed-source game that supports modding and extension like Minecraft. Or a closed-source DLL with documentation...

Anyway, the point is, it's closed. If it's not closed source, I'd beg you to link the source, both code and data, that compiles to the output.

[–] Tetsuo@jlai.lu 2 points 13 hours ago* (last edited 13 hours ago) (1 children)

But a model isn't "compiled".

The weights are fully readable. Every single one of them.

In a binary you have to use special software to get to the source code. The weights are the source and can be freely used to create a model with. The weights are used "as is" no transformation has to be done like when reversing a closed source binary that could also use obfuscation to make it more difficult.

That's why I would like to insist that open weights are not like a binary as they are usable as is essentially. When I use a model like that my computer is no executing the weights like instructions. They are aptly named "weights" for a reason and are a mere reflection of what the model learned through training.

Disclaimer, below this is the explanation between a closed source binary and open weights by an LLM :

A closed-source binary is like getting a sealed, opaque machine. You push a button, it gives an output. You have no idea how it works inside.

Open weights are like getting the complete, labeled blueprints (architecture) and the exact, measured specifications for every single component—every gear, wire, and circuit (weights). You don't have the designer's engineering notebooks (training data/recipe) or the factory that built it (training compute), but you can absolutely build the machine yourself, measure every part, swap out components, and try to improve it.

[–] XLE@piefed.social 1 points 13 hours ago* (last edited 13 hours ago) (1 children)

The source for creating the model, the training data, is closed, locked, a heavily guarded corporate secret. But unlike code for software, this data might be illegally or unethically gained, and Mistral may be violating the law by not publishing some of it.

You can "read" the assembly language of a freeware EXE program just as easily as you can "read" the open model of a closed source LLM blob: not very easily. That's why companies freak out over potential hidden training data: the professionals developing these models are incapable of understanding them. (I shudder to imagine a world where architects could not read blueprints.)

[–] Tetsuo@jlai.lu 1 points 12 hours ago

I agree completely that it is scary we use extensively something that we truly dont understand. But even with the cleanest and most open source and open weights model ever, this statement would be just as true.

With parameters in the billions all interacting with each others it's extremely hard to analyse. LLM are inherently hard to understand fully not really because of the opaqueness of the training data but rather the raw number of parameters involved in inference.

I would certainly be more appreciative of an open weights model were we destroy the environment (I have no choice in that) but at least we get the resulting matrix of parameters rather than OpenAI shit were we have absolutely nothing.

IMO a model like Deepseek is most certainly a clone of someone else model that published their open weights. An open weights model enabled someone else to build on top of whatever controversial base was used. So at least it can be used by a wider audience and the original trainers don't have their say in what you do as long as you respect the licence.

I think the above process has pretty much nothing to do about the struggle of some linux dev losing hair trying to make sense of a Microsoft obfuscated binary...

I guess many will not share that "lesser evil" point of view on open weights model but it's not like anybody is gonna stop the AI conglomerate to do their shit in the complete dark and under no regulation whatsoever.