this post was submitted on 22 Sep 2025
1126 points (99.1% liked)

Microblog Memes

9348 readers
2067 users here now

A place to share screenshots of Microblog posts, whether from Mastodon, tumblr, ~~Twitter~~ X, KBin, Threads or elsewhere.

Created as an evolution of White People Twitter and other tweet-capture subreddits.

Rules:

  1. Please put at least one word relevant to the post in the post title.
  2. Be nice.
  3. No advertising, brand promotion or guerilla marketing.
  4. Posters are encouraged to link to the toot or tweet etc in the description of posts.

Related communities:

founded 2 years ago
MODERATORS
1126
Very much smart people (piefedimages.s3.eu-central-003.backblazeb2.com)
 
you are viewing a single comment's thread
view the rest of the comments
[โ€“] ragas@lemmy.ml 7 points 1 week ago* (last edited 1 week ago) (1 children)

I mean I don't know for sure but I think they often just code program logic in to filter for some requests that they do not want.

My evidence for that is that I can trigger some "I cannot help you with that" responses by asking completely normal things that just use the wrong word.

It's not 100%, and you're more or less just asking the LLM to behave, and filtering the response through another non-perfect model after that which is trying to decide if it's malicious or not. It's not standard coding in that it's a boolean returned - it's a probability that what the user asked is appropriate according to another model. If the probability is over a threshold then it rejects.