1128

submitted 3 months ago by LainTrain@lemmy.dbzer0.com to c/memes@lemmy.world

91 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[+] nondescripthandle@lemmy.dbzer0.com -7 points 3 months ago* (last edited 3 months ago)

Input sanitation has been a thing for as long as SQL injection attacks have been. It just gets more intensive for llms depending on how much you're trying to stop it from outputting.

[-] MajorHavoc@programming.dev 20 points 3 months ago* (last edited 3 months ago)

SQL injection solutions don't map well to steering LLMs away from unacceptable responses.

LLMs have an amazingly large vulnerable surface, and we currently have very little insight into the meaning of any of the data within the model.

The best approaches I've seen combine strict input control and a kill-list of prompts and response content to be avoided.

Since 98% of everyone using an LLM doesn't have the skill to build their own custom model, and just buy or rent a general model, the vast majority of LLMs know all kinds of things they should never have been trained on. Hence the dirty limericks, racism and bomb recipes.

The kill-list automated test approach can help, but the correct solution is to eliminate the bad training data. Since most folks don't have that expertise, it tends not to happen.

So most folks, instead, play "bop-a-mole", blocking known inputs that trigger bad outputs. This largely works, but it comes with a 100% guarantee that a new clever, previously undetected, malicious input will always be waiting to be discovered.

[-] frezik@midwest.social 10 points 3 months ago

Right, it's something like trying to get a three year old to eat their peas. It might work. It might also result in a bunch of peas on the floor.

[-] InAbsentia@lemmy.world 9 points 3 months ago

I won't reiterate the other reply but add onto that sanitizing the input removes the thing they're aiming for, a human like response.

this post was submitted on 25 Jul 2024

1128 points (98.4% liked)

memes

10177 readers

1602 users here now

Community rules

1. Be civil

No trolling, bigotry or other insulting / annoying behaviour

2. No politics

This is non-politics community. For political memes please go to !politicalmemes@lemmy.world

3. No recent reposts

Check for reposts when posting a meme, you can only repost after 1 month

4. No bots

No bots without the express approval of the mods or the admins

5. No Spam/Ads

No advertisements or spam. This is an instance rule and the only way to live.

Sister communities

!tenforward@lemmy.world : Star Trek memes, chat and shitposts
!lemmyshitpost@lemmy.world : Lemmy Shitposts, anything and everything goes.
!linuxmemes@lemmy.world : Linux themed memes
!comicstrips@lemmy.world : for those who love comic stories.

founded 1 year ago

MODERATORS

Tenthrow@lemmy.world

The_Picard_Maneuver@lemmy.world

The_Picard_Maneuver@startrek.website