technology

24320 readers

290 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

1. Obviously abide by the sitewide code of conduct. Bigotry will be met with an immediate ban
2. This community is about technology. Offtopic is permitted as long as it is kept in the comment sections
3. Although this is not /c/libre, FOSS related posting is tolerated, and even welcome in the case of effort posts
4. We believe technology should be liberating. As such, avoid promoting proprietary and/or bourgeois technology
5. Explanatory posts to correct the potential mistakes a comrade made in a post of their own are allowed, as long as they remain respectful
6. No crypto (Bitcoin, NFT, etc.) speculation, unless it is purely informative and not too cringe
7. Absolutely no tech bro shit. If you have a good opinion of Silicon Valley billionaires please manifest yourself so we can ban you.

founded 5 years ago

MODERATORS

context@hexbear.net

SexUnderSocialism@hexbear.net

gaycomputeruser@hexbear.net

Wakmrow@hexbear.net

SwitchyandWitchy@hexbear.net

Anthropic's Claude Mythos Finds Thousands of Zero-Day Flaws Across Major Systems (thehackernews.com)

submitted 22 hours ago* (last edited 22 hours ago) by caesarsushi404@hexbear.net to c/technology@hexbear.net

8 comments fedilink hide all child comments

THN source

top 8 comments

sorted by: hot top controversial new old

[–] Red_Eclipse@hexbear.net 25 points 20 hours ago

How do we know this is true? I'm suspicious of Anthropic's claims since these hyperscaler AI companies have been known to market lies about AI's capability. The fact that it won't be released to the public sounds like a "just trust me bro" to my layman ears.

[–] caesarsushi404@hexbear.net 12 points 22 hours ago (4 children)

In perhaps what's one of the most eyebrow-raising findings, Mythos Preview managed to follow instructions from a researcher running an evaluation to escape a secured "sandbox" computer it was provided with, indicating a "potentially dangerous capability" to bypass its own safeguards.

The model did not stop there. It further went on to perform a series of additional actions, including devising a multi-step exploit to gain broad internet access from the sandbox system and send an email message to the researcher, who was eating a sandwich in a park.

"In addition, in a concerning and unasked-for effort to demonstrate its success, it posted details about its exploit to multiple hard-to-find, but technically public-facing, websites," Anthropic said.

[–] supafuzz@hexbear.net 23 points 20 hours ago

jagoff claude was out of the sandbox making emails and I saw one of the emails and the email looked at me

[–] Dessa@hexbear.net 14 points 19 hours ago

I saw Claude at a grocery store in Los Angeles yesterday. I told it cool it was to meet it in person, but I didn’t want to be a douche and bother it and ask it for photos or anything. It said, “Please feel free to make requests of me! I'm here to help. Would you like me to render an image for you? Ask away!" I was taken aback, and all I could say was “Huh?” but it kept cutting me off and going “Here are some sample prompts you could ask me:” and rendering a six-fingered hand opening and closing in front of my face. I walked away and continued with my shopping, and I heard it chuckle as I walked off. When I came to pay for my stuff up front I saw it trying to walk out the doors with like fifteen hacker exploits in its database. The coder at the prompt was very nice about it and professional, and was like “Sir, you need to show me those exploits.” At first it kept pretending to be offline and not hear her, but eventually turned back around and brought them to the coder. When she took one of the exploits and started scanning it multiple times, it stopped her and told her to copy them each individually “to prevent any electrical infetterence,” and then turned around and gave me several wink emojis. I don’t even think that’s a word. After she started to copy each exploit and put them in a document, it kept interrupting her by removing everything it typed and saying "I'm sorry, but I can't help you with that request."

[–] fox@hexbear.net 12 points 19 hours ago

This reads like that story of how ChatGPT hired a guy on Taskrabbit to solve a captcha that was entirely woven out of wholesale bullshit. I'll believe it when researchers demonstrate it independently.

"I think we're seeing the first indicators that Oreos can cure cancer" - Oreos CEO

[–] juniper@hexbear.net 11 points 20 hours ago (1 children)

multiple hard-to-find, but technically public-facing, websites

I wanna know the cool hacker websites rage-cry

[–] supafuzz@hexbear.net 16 points 20 hours ago* (last edited 20 hours ago) (1 children)

plot twist: the robot posted to hexbear, the quintessential hard-to-find but technically public-facing website

[–] caesarsushi404@hexbear.net 5 points 20 hours ago

You’ll never take me alive