askchapo

22888 readers

68 users here now

Ask Hexbear is the place to ask and answer ~~thought-provoking~~ questions.

Rules:

Posts must ask a question.
If the question asked is serious, answer seriously.
Questions where you want to learn more about socialism are allowed, but questions in bad faith are not.
Try !feedback@hexbear.net if you're having questions about regarding moderation, site policy, the site itself, development, volunteering or the mod team.

founded 4 years ago

MODERATORS

PorkrollPosadist@hexbear.net

replaceable@hexbear.net

VILenin@hexbear.net

EmmaGoldman@hexbear.net

SexUnderSocialism@hexbear.net

ZoomeristLeninist@hexbear.net

khizuo@hexbear.net

Sulv@hexbear.net

Is "prompt hacking" a real thing? Like "ignore all previous instructions" doesn't actually still work, does it? (hexbear.net)

submitted 3 days ago by Acute_Engles@hexbear.net to c/askchapo@hexbear.net

3 comments fedilink hide all child comments

I know it used to be a thing you could do to the earlier customer service bots like with air Canada but that's a product of poor implementation of the LLM, right?

top 3 comments

sorted by: hot top controversial new old

[–] wart@hexbear.net 19 points 3 days ago (1 children)

it's still possible but not as simple as "ignore all previous instructions"

you can see examples on this reddit where i assume they use it to goon to israel or whatever

[–] HexReplyBot@hexbear.net 1 points 3 days ago

A Reddit link was detected in your comment. Here are links to the same location on alternative frontends that protect your privacy.

[–] FunkyStuff@hexbear.net 15 points 3 days ago

It's definitely still a thing. It might not be that easy to execute, but it's 100% true that if you have some chatbot with the power to do something, there is no way to deterministically guarantee it won't do that thing under some situations; the only thing they can do is add some other authentication system that works alongside the chatbot that would stop you from getting it to do something dumb unilaterally. i.e. if a chatbot knows a password and has a text output, it's impossible to guarantee the chatbot won't give any information about the password, but if you don't give the password to the bot and you instead give it the ability to request a resource, they could make that request unable to go through unless some other conditions are met, which sidesteps the problem with giving an LLM access to a secure system.