How do we know this is true? I'm suspicious of Anthropic's claims since these hyperscaler AI companies have been known to market lies about AI's capability. The fact that it won't be released to the public sounds like a "just trust me bro" to my layman ears.
technology
On the road to fully automated luxury gay space communism.
Spreading Linux propaganda since 2020
- Ways to run Microsoft/Adobe and more on Linux
- The Ultimate FOSS Guide For Android
- Great libre software on Windows
- Hey you, the lib still using Chrome. Read this post!
Rules:
- 1. Obviously abide by the sitewide code of conduct. Bigotry will be met with an immediate ban
- 2. This community is about technology. Offtopic is permitted as long as it is kept in the comment sections
- 3. Although this is not /c/libre, FOSS related posting is tolerated, and even welcome in the case of effort posts
- 4. We believe technology should be liberating. As such, avoid promoting proprietary and/or bourgeois technology
- 5. Explanatory posts to correct the potential mistakes a comrade made in a post of their own are allowed, as long as they remain respectful
- 6. No crypto (Bitcoin, NFT, etc.) speculation, unless it is purely informative and not too cringe
- 7. Absolutely no tech bro shit. If you have a good opinion of Silicon Valley billionaires please manifest yourself so we can ban you.
In perhaps what's one of the most eyebrow-raising findings, Mythos Preview managed to follow instructions from a researcher running an evaluation to escape a secured "sandbox" computer it was provided with, indicating a "potentially dangerous capability" to bypass its own safeguards.
The model did not stop there. It further went on to perform a series of additional actions, including devising a multi-step exploit to gain broad internet access from the sandbox system and send an email message to the researcher, who was eating a sandwich in a park.
"In addition, in a concerning and unasked-for effort to demonstrate its success, it posted details about its exploit to multiple hard-to-find, but technically public-facing, websites," Anthropic said.
claude was out of the sandbox making emails and I saw one of the emails and the email looked at me
I saw Claude at a grocery store in Los Angeles yesterday. I told it cool it was to meet it in person, but I didn’t want to be a douche and bother it and ask it for photos or anything. It said, “Please feel free to make requests of me! I'm here to help. Would you like me to render an image for you? Ask away!" I was taken aback, and all I could say was “Huh?” but it kept cutting me off and going “Here are some sample prompts you could ask me:” and rendering a six-fingered hand opening and closing in front of my face. I walked away and continued with my shopping, and I heard it chuckle as I walked off. When I came to pay for my stuff up front I saw it trying to walk out the doors with like fifteen hacker exploits in its database. The coder at the prompt was very nice about it and professional, and was like “Sir, you need to show me those exploits.” At first it kept pretending to be offline and not hear her, but eventually turned back around and brought them to the coder. When she took one of the exploits and started scanning it multiple times, it stopped her and told her to copy them each individually “to prevent any electrical infetterence,” and then turned around and gave me several wink emojis. I don’t even think that’s a word. After she started to copy each exploit and put them in a document, it kept interrupting her by removing everything it typed and saying "I'm sorry, but I can't help you with that request."
This reads like that story of how ChatGPT hired a guy on Taskrabbit to solve a captcha that was entirely woven out of wholesale bullshit. I'll believe it when researchers demonstrate it independently.
"I think we're seeing the first indicators that Oreos can cure cancer" - Oreos CEO
multiple hard-to-find, but technically public-facing, websites
I wanna know the cool hacker websites 
plot twist: the robot posted to hexbear, the quintessential hard-to-find but technically public-facing website
You’ll never take me alive