Microblog Memes

9608 readers

1964 users here now

A place to share screenshots of Microblog posts, whether from Mastodon, tumblr, ~~Twitter~~ X, KBin, Threads or elsewhere.

Created as an evolution of White People Twitter and other tweet-capture subreddits.

Rules:

Please put at least one word relevant to the post in the post title.
Be nice.
No advertising, brand promotion or guerilla marketing.
Posters are encouraged to link to the toot or tweet etc in the description of posts.

Related communities:

founded 2 years ago

MODERATORS

ReadyUser31@lemmy.world

aeronmelon@lemmy.world

needanke@feddit.org

1127

Very much smart people (piefedimages.s3.eu-central-003.backblazeb2.com)

submitted 1 month ago by RmDebArc_5@piefed.zip to c/microblogmemes@lemmy.world

119 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] scrubbles@poptalk.scrubbles.tech 20 points 1 month ago (1 children)

Love this because I completely agree. "We fixed it and it no longer does the bad thing". Uh no, incorrect, unless you literally went through your entire dataset and stripped out every single occurrence of the thing and retrained it, then no there is no way that you 100% "fixed" it

[–] ragas@lemmy.ml 7 points 1 month ago* (last edited 1 month ago) (1 children)

I mean I don't know for sure but I think they often just code program logic in to filter for some requests that they do not want.

My evidence for that is that I can trigger some "I cannot help you with that" responses by asking completely normal things that just use the wrong word.

[–] scrubbles@poptalk.scrubbles.tech 1 points 1 month ago

It's not 100%, and you're more or less just asking the LLM to behave, and filtering the response through another non-perfect model after that which is trying to decide if it's malicious or not. It's not standard coding in that it's a boolean returned - it's a probability that what the user asked is appropriate according to another model. If the probability is over a threshold then it rejects.