Technology

83098 readers
2685 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
1
2
3
 
 

The ARC Prize organization designs benchmarks which are specifically crafted to demonstrate tasks that humans complete easily, but are difficult for AIs like LLMs, "Reasoning" models, and Agentic frameworks.

ARC-AGI-3 is the first fully interactive benchmark in the ARC-AGI series. ARC-AGI-3 represents hundreds of original turn-based environments, each handcrafted by a team of human game designers. There are no instructions, no rules, and no stated goals. To succeed, an AI agent must explore each environment on its own, figure out how it works, discover what winning looks like, and carry what it learns forward across increasingly difficult levels.

Previous ARC-AGI benchmarks predicted and tracked major AI breakthroughs, from reasoning models to coding agents. ARC-AGI-3 points to what's next: the gap between AI that can follow instructions and AI that can genuinely explore, learn, and adapt in unfamiliar situations.

You can try the tasks yourself here: https://arcprize.org/arc-agi/3

Here is the current leaderboard for ARC-AGI 3, using state of the art models

  • OpenAI GPT-5.4 High - 0.3% success rate at $5.2K
  • Google Gemini 3.1 Pro - 0.2% success rate at $2.2K
  • Anthropic Opus 4.6 Max - 0.2% success rate at $8.9K
  • xAI Grok 4.20 Reasoning - 0.0% success rate $3.8K.

ARC-AGI 3 Leaderboard
(Logarithmic cost on the horizontal axis. Note that the vertical scale goes from 0% to 3% in this graph. If human scores were included, they would be at 100%, at the cost of approximately $250.)

https://arcprize.org/leaderboard

Technical report: https://arcprize.org/media/ARC_AGI_3_Technical_Report.pdf

In order for an environment to be included in ARC-AGI-3, it needs to pass the minimum “easy for humans” threshold. Each environment was attempted by 10 people. Only environments that could be fully solved by at least two human participants (independently) were considered for inclusion in the public, semi-private and fully-private sets. Many environments were solved by six or more people. As a reminder, an environment is considered solved only if the test taker was able to complete all levels, upon seeing the environment for the very first time. As such, all ARC-AGI-3 environments are verified to be 100% solvable by humans with no prior task-specific training

4
5
 
 

One of the more fucked up business models I’ve seen.

If you search for support groups, you’ll see thousands of people being outed in anonymous recovery meeting, grief groups, etc. Disgusting.

I hope these guys get sued to hell.

6
 
 

Long but well written article. It's hard to disagree with any of the specific points. Warning that it's pretty long, and reads like a sci-fi novel.

Curious for opinions. This seems alarming? But also doomsday predictions tend to be wrong.

7
 
 

The European Commission preliminarily found Pornhub, Stripchat, XNXX and XVideos in breach of the Digital Services Act (DSA) for failing to protect minors from being exposed to pornographic content on their services.

8
 
 

cross-posted from: https://lemmy.ml/post/45059519

Ever seen our AOSP based apps (Phone,Messages,Gallery...) & thought I could make a difference to bring them up?

We're seeking a senior Android engineer to take ownership of the default app suite:

https://grapheneos.org/hiring#android-apps-software-engineer

Code standard is high, vibe coders need not apply.

9
 
 

Online anonymity is over for now

"We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting.

Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered."

10
11
Called it months ago. (www.tomshardware.com)
submitted 9 hours ago* (last edited 9 hours ago) by DFX4509B@lemmy.wtf to c/technology@lemmy.world
11
 
 

The European Commission has opened formal proceedings to investigate if Snapchat is ensuring a high level of safety, privacy and security for children online, in compliance with the Digital Services Act (DSA).

Snapchat may have breached the DSA by exposing minors to grooming attempts and recruitment for criminal purposes, as well as to information about the sale of illegal goods, like drugs, or age-restricted products, such as vapes and alcohol.

12
13
14
 
 

In a letter sent Thursday to Director of National Intelligence Tulsi Gabbard, the lawmakers say that because VPNs obscure a user's true location, and because intelligence agencies presume that communications of unknown origin are foreign, Americans may be inadvertently waiving the privacy protections they're entitled to under the law.

Several federal agencies, including the FBI, NSA, and FTC, have recommended that consumers use VPNs to protect their privacy. But following that advice may inadvertently cost Americans the very protections they're seeking.

The letter was signed by members of the Democratic Party’s progressive flank: Senators Ron Wyden, Elizabeth Warren, Edward Markey, and Alex Padilla, along with Representatives Pramila Jayapal and Sara Jacobs.

15
16
 
 

There will be a hyperloop station at each fab, no doubt.

17
18
 
 

Microsoft's GitHub next month plans to begin using customer interaction data – "specifically inputs, outputs, code snippets, and associated context" – to train its AI models.

19
 
 
20
21
22
23
 
 

Tech Oversight.

A California jury on Wednesday found that Meta and Google were to blame for the depression and anxiety of a woman who compulsively used social media as a small child, awarding her $3 million in a rare verdict holding Silicon Valley accountable for its role in fueling a youth mental health crisis.

The jurors concluded that Meta and Google should pay the woman $3 million in compensatory damages, with Meta on the hook for 70% of that amount.

The jury also decided that Meta and Google's actions should trigger punitive damages, which means there will be a separate phase of the trial where the jury will decide what amount of damages are appropriate to punish the multi-trillion-dollar companies for their conduct.

24
25
view more: next ›