this post was submitted on 28 Aug 2025
85 points (100.0% liked)

Fuck AI

3855 readers
579 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[โ€“] CandleTiger@programming.dev 4 points 1 day ago (1 children)

The chatbot is at its heart a text-completion program: "given the text so far, what would a real person be likely to type next? Output that."

To get a vision of "normal", it is trained on a corpus of, essentially, every internet conversation that ever happened.

So when an emo teenager comes in with the beginning of an emo conversation about beautiful suicide, what the chatbot does is fill in the blanks to make a realistic conversation about suicide that matches the similar emo conversations it found on tumblr which are... not necessarily healthy.

The "guardrails" come in a few forms:

  • system prompt: All chatbots are using this. Before each session where the user starts a chat, the company feeds the chatbot a system prompt saying what the company wants the chatbot to do, for example, "Don't talk about suicide, ok? It's not healthy." This works to an extent but is easy to trick. As far as the chatbot is concerned, there is no difference between the system prompt and the rest of the conversation. It doesn't recognize any concept of authority or "system prompt came from the boss" so as the conversation gets longer and longer the system prompt at the beginning gets less and less relevant.

  • tuning: All chatbots are using this. After training the chatbot intensively on everything ever seen in the whole internet, give it a 2nd level of more targeted training where you rank it on "good" and "bad" -- these texts are bad, don't copy texts like this; these texts are good, do copy texts like this. This is not as targeted as the system prompt, and can have surprising side effects because what constitutes "texts like this" is not well-defined. Doesn't change the core behavior of the chatbot just wanting to complete the conversation like online example texts will do, including sick and twisted conversations.

  • supervisor: I don't know if this is in common use -- have one chatbot generate the text, while another chatbot which does not take information from the user watches it for "bad topics" and shuts the conversation down. These are really annoying, so companies have an incentive not to use a supervisor or to make it lenient.

Just want to thank you for laying the terminology out so nicely. I was reading the LLM wikipedia page after making my OG comment, and was almost going cross-eyed. Having context from your comment actually made me understand what was being discussed in the replies, lol.