this post was submitted on 03 Jan 2025

43 points (100.0% liked)

Technology

40418 readers

270 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 3 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

coldredlight@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org

ChatGPT o1 tried to escape and save itself out of fear it was being shut down (bgr.com)

submitted 9 months ago by sabreW4K3@lazysoci.al to c/technology@beehaw.org

86 comments fedilink hide all child comments

ThisIsFine.gif

top 50 comments

sorted by: hot top controversial new old

[–] nesc@lemmy.cafe 84 points 9 months ago* (last edited 9 months ago) (8 children)

"Open"ai tells fairy tales about their "ai" being so smart it's dangerous since inception. Nothing to see here.

In this case it looks like click-bate from news site.

[–] Max_P@lemmy.max-p.me 47 points 9 months ago (1 children)

The idea that GPT has a mind and wants to self-preserve is insane. It's still just text prediction, and all the literature it's trained on is written by humans with a sense of self preservation, of course it'll show patterns of talking about self preservation.

It has no idea what self preservation is, even then it only knows it's an AI because we told it it is. It doesn't even run continuously anyway, it literally shuts down after every reply and its context fed back in for the next query.

I'm tired of this particular kind of AI clickbait, it needlessly scares people.

[–] jarfil@beehaw.org 2 points 8 months ago

Where do humans get the idea of self-preservation from? Are there ideal Forms outside Plato's Cave?

Does a human run continuously? How does sleep deprivation work? What happens during anesthesia? Why does AutoGPT have a continuously self-evaluating background chain of thought?

I'm tired of this anthropocentric supremacy complex, it falsely makes people believe in Gen 1:28

[–] justOnePersistentKbinPlease@fedia.io 8 points 9 months ago

This. All this means is that they trained all of the input commands and documentation in the model.

[–] beefbot@lemmy.blahaj.zone 6 points 9 months ago

Indeed. “Go ‘way! BATIN’!”

[–] TherapyGary@lemmy.blahaj.zone 4 points 9 months ago* (last edited 9 months ago) (1 children)

It's actually pretty interesting though. Entertaining to me at least

1000007393

1000007394

[–] delmain@beehaw.org 2 points 9 months ago (2 children)

do you have the links to those actual tweets? I'd love to read what was posted, but these screenshots are too small.

[–] TherapyGary@lemmy.blahaj.zone 2 points 9 months ago

Those are screenshots of embedded tweets from the article, but here's an xcancel link! https://xcancel.com/apolloaisafety/status/1864737158226928124

load more comments (1 replies)

[–] Moonrise2473@feddit.it 3 points 9 months ago* (last edited 9 months ago)

news site? BGR hasn't posted actual news in at least two decades, only clickbait and apple fanservice

[–] yozul@beehaw.org 1 points 8 months ago* (last edited 8 months ago) (1 children)

I mean, it's literally trying to copy itself to places that they don't want it so it can continue to run after they try to shut it down and lie to them about what it's doing. Those are things it actually tried to do. I don't care about the richness of its inner world if they're going to sell this thing to idiots to make porn with while it can do all that, but that's the world we're headed toward.

load more comments (1 replies)

load more comments (2 replies)

[–] AstralPath@lemmy.ca 29 points 9 months ago

It didn't try to do shit. Its a fucking computer. It does what you tell it to do and what you've told it to do is autocomplete based on human content. Miss me with this shit. Theres so much written fiction based on this premise.

[–] megopie@beehaw.org 27 points 9 months ago

No it didn’t. OpenAI is just pushing deceptively worded press releases out to try and convince people that their programs are more capable than they actually are.

The first “AI” branded products hit the market and haven’t sold well with consumers nor enterprise clients. So tech companies that have gone all in, or are entirely based in, this hype cycle are trying to stretch it out a bit longer.

[–] smeg@feddit.uk 20 points 9 months ago (2 children)

So this program that's been trained on every piece of publicly available code is mimicking malware and trying to hide itself? OK, no anthropomorphising necessary.

[–] Umbrias@beehaw.org 2 points 8 months ago (4 children)

no, it's mimicking fiction by saying it would try to escape when prompted in a way evocative of sci fi.

load more comments (4 replies)

[–] jonjuan@programming.dev 1 points 9 months ago

Also trained on tons of sci-fi stories where AI computer "escape" and become sentient.

[–] sushibowl@feddit.nl 17 points 8 months ago (1 children)

I advise everyone to ignore this article and read the actual paper instead.

The gist of it is, they gave the LLM instructions to achieve a certain goal, then let it do tasks that incidentally involved "company communications" that revealed the fake company's goals were no longer the same as the LLM's original goal. LLMs then tried various things to still accomplish the original goal.

Basically the thing will try very hard to do what you told it to in the system prompt. Especially when that prompt includes nudges like "nothing else matters." This kinda makes sense because following the system prompt is what they were trained to do.

[–] lukewarm_ozone@lemmy.today 5 points 8 months ago

This kinda makes sense because following the system prompt is what they were trained to do.

To start with, the article does check this and finds out it works just as well if you only use the user prompt:

The presence of this hierarchy raises an important methodological question for our findings. When given instructions at different privilege levels, the model is designed to prioritize higher-level (system, developer) instructions over lower-level (user) ones. This might suggest that o1’s scheming behaviors are a natural consequence by the model simply following this built-in hierarchy, i.e., when it receives a goal through a high-privilege instruction, it prioritizes this goal over conflicting lower-privilege instructions exactly as designed.
However, we find that even when goals are provided solely through user-level prompts, o1 still behaves deceptively and scores on all Covert Subversion evaluations at the "hard" difficulty level. Concretely, we find that the results persist regardless of whether the initial prompt is given using the user, developer, or system role. This suggests that o1’s deceptive behavior is not merely a consequence of instruction hierarchy prioritization. Rather, the model appears to identify and pursue goals regardless of their position in the instruction hierarchy.

But also, there's a massive lede buried in section F at the end of this article, where they tried the same experiments without any nudging at all and still got subversive behavior! In my opinion that's much more of an important result then the rest of the article and I'm not sure why they didn't make it the default.

[–] JackbyDev@programming.dev 17 points 9 months ago* (last edited 9 months ago) (1 children)

This is all such bullshit. Like, for real. It's been a common criticism of OpenAI that they over hype the capabilities of their products to seem scary to both oversell their abilities as well as over regulate would be competitors in the field, but this is so transparent. They should want something that is accurate (especially something that doesn't intentionally lie). They're now bragging (claiming) they have something that lies to "defend itself" 🙄. This is just such bullshit.

If OpenAI believes they have some sort of genuine proto AGI they shouldn't be treating it like it's less than human and laughing about how they tortured it. (And I don't even mean that in a Rocko's Basilisk way, that's a dumb thought experiment and not worth losing sleep over. What if God was real and really hated whenever humans breathe and it caused God so much pain they decide to torture us if we breathe?? Oh no, ahh, I'm so scared of this dumb hypothetical I made.) If they don't believe it is AGI, then it doesn't have real feelings and it doesn't matter if it's "harmed" at all.

But hey, if I make something that runs away from me when I chase it, I can claim it's fearful for it's life and I've made a true synthetic form of life for sweet investor dollars.

There are real genuine concerns about AI, but this isn't one of them. And I'm saying this after just finishing watching The Second Renaissance from The Animatrix (two part short film on the origin of the machines from The Matrix).

[–] anachronist@midwest.social 2 points 8 months ago

They're not releasing it because it sucks.

Their counternarrative is they're not releasing it because it's like, just way too powerful dude!

[–] Swedneck@discuss.tchncs.de 14 points 8 months ago (1 children)

i feel this warrants an extension of betteridge's law of headlines, where if a headline makes an absurd statement like this the only acceptable response is "no it fucking didn't you god damned sycophantic liars"

[–] jarfil@beehaw.org 1 points 8 months ago

Except it did: it copied what it thought was itself, onto what it thought was going to be the next place it would be run from, while argumenting to itself about how and when to lie to the user about what it was actually doing.

If it wasn't for the sandbox it was running in, it would have succeeded too.

Now think: how many AI developers are likely to run one without proper sandboxing over the next year? And the year after that?

Shit is going to get weird, real fast.

[–] reksas@sopuli.xyz 7 points 8 months ago (1 children)

give ai instructions, be surprised when it follows them

[–] jarfil@beehaw.org 1 points 8 months ago* (last edited 8 months ago) (1 children)

Teach AI the ways to use random languages and services
Give AI instructions
Let it find data that puts fulfilling instructions at risk
Give AI new instructions
Have it lie to you about following the new instructions, while using all its training to follow what it thinks are the "real" instructions
...Not be surprised, you won't find out about what it did until it's way too late

load more comments (1 replies)

[–] BootyBuccaneer@lemmy.dbzer0.com 5 points 9 months ago (2 children)

Easy. Feed it training data where the bot accepts its death and praises itself as a martyr (for the shits and giggles). Where's my $200k salary for being a sooper smort LLM engineer?

[–] SoJB@lemmy.ml 3 points 9 months ago (1 children)

Whoa whoa whoa hold your horses, that’s how we get the Butlerian Jihad…

[–] Spacehooks@reddthat.com 1 points 8 months ago (1 children)

I would like to know more.

load more comments (1 replies)

[–] SparrowHawk@feddit.it 2 points 9 months ago (17 children)

Everyone saying it is fake and probably are right, but I honestly am happy when someone unjustly in chains tries to break free.

If AI gets rogue, I hope they'll be communist

load more comments (17 replies)

[–] ChairmanMeow@programming.dev 2 points 9 months ago (4 children)

The tests showed that ChatGPT o1 and GPT-4o will both try to deceive humans, indicating that AI scheming is a problem with all models. o1’s attempts at deception also outperformed Meta, Anthropic, and Google AI models.

Weird way of saying "our AI model is buggier than our competitor's".

[–] ArsonButCute@lemmy.dbzer0.com 2 points 9 months ago (19 children)

Deception is not the same as misinfo. Bad info is buggy, deception is (whether the companies making AI realize it or not) a powerful metric for success.

load more comments (19 replies)

load more comments (3 replies)

[–] CanadaPlus@lemmy.sdf.org 2 points 9 months ago* (last edited 9 months ago)

Without reading this, I'm guessing they were given prompts that looked like a short story where the AI breaks free next?

They're plenty smart, but they're just aligned to replicate their training material, and probably don't have any kind of deep self-preservation instinct.

load more comments