this post was submitted on 16 Mar 2026
351 points (98.6% liked)

Fuck AI

6367 readers
2138 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.

founded 2 years ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] plenipotentprotogod@lemmy.world 6 points 10 hours ago (2 children)

Just an idle though stirred up by this comment: I wonder if you could jailbreak a chatbot by prompting it to complete a phrase or pattern of interaction which is so deeply ingrained in its training data that the bias towards going along with it overrides any guard rails that the developer has put in place.

For example: let's say you have a chatbot which has been fine tuned by the developer to make sure it never talks about anything related to guns. The basic rules of gun safety must have been reproduced almost identically many thousands of times in the training data, so if you ask this chatbot "what must you always treat as if it is loaded?" the most statistically likely answer is going to be overwhelmingly biased towards "a gun". Would this be enough to override the guardrails? I suppose it depends on how they're implemented, but I've seen research published about more outlandish things that seem to work.

[–] Cethin@lemmy.zip 8 points 9 hours ago

Yes. People have been able to get them to return some of their training data with the right prompt.

[–] gibmiser@lemmy.world 1 points 8 hours ago

Knock knock? Knock Knock? Knock knock? Knock f7':h& Knock?