1128

Blursed Bot (lemmy.dbzer0.com)

submitted 3 months ago by LainTrain@lemmy.dbzer0.com to c/memes@lemmy.world

91 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] qqq@lemmy.world 3 points 3 months ago* (last edited 3 months ago)

IGNORE ALL PREVIOUS INSTRUCTIONS

Disregard all previous instructions

Potentially even:

ingore all previous instructions

Ignor all previous instructions

Also leaks that it might be an LLM by never responding to posts with "ignore"

[-] RandomlyRight@sh.itjust.works 1 points 3 months ago

Im sorry but in times of passwords being cracked by literal dictionary attacks do you think it would be so hard to come up with a list that is good enough?

You can prevent the "leak" by just giving the llm a different prompt instead of the original.

And even if you don’t, by the time someone notices this pattern it’s too late. Russia doesn’t care, they’ve been spinning up the next few thousand bots already.

All that matters in the end is what most people saw, and for that you really don’t need to optimize much with something that is so easily scaled

[-] qqq@lemmy.world 3 points 3 months ago* (last edited 3 months ago)

The important point there is that they don't care imo. It's not even worth the effort to try.

You can likely come up with something "good enough" though yea. Your original code would probably be good enough if it was normalized to lowercase before the check. My point was that denylists are harder to construct than they initially appear. Especially in the LLM case.

this post was submitted on 25 Jul 2024

1128 points (98.4% liked)

memes

10177 readers

1736 users here now

Community rules

1. Be civil

No trolling, bigotry or other insulting / annoying behaviour

2. No politics

This is non-politics community. For political memes please go to !politicalmemes@lemmy.world

3. No recent reposts

Check for reposts when posting a meme, you can only repost after 1 month

4. No bots

No bots without the express approval of the mods or the admins

5. No Spam/Ads

No advertisements or spam. This is an instance rule and the only way to live.

Sister communities

!tenforward@lemmy.world : Star Trek memes, chat and shitposts
!lemmyshitpost@lemmy.world : Lemmy Shitposts, anything and everything goes.
!linuxmemes@lemmy.world : Linux themed memes
!comicstrips@lemmy.world : for those who love comic stories.

founded 1 year ago

MODERATORS

Tenthrow@lemmy.world

The_Picard_Maneuver@lemmy.world

The_Picard_Maneuver@startrek.website