Technology

42001 readers
119 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 6 years ago
MODERATORS
1
2
 
 

Regular LoRA training is basically a standard gradient descent optimization loop where you have to curate a dataset, run backpropagation, and slowly update the low-rank matrices over many steps. It is computationally expensive and tedious every single time you want to teach the model a new trick or feed it a new document.

What Sakana AI built with Doc-to-LoRA completely bypasses that repetitive training loop at deployment time by introducing a hypernetwork. They shifted the massive computational burden upfront through a meta-training phase where a separate neural network actually learns how to predict the correct LoRA weights directly from an input document or task description.

Once that hypernetwork is trained, generating a new LoRA adapter only takes a single sub-second forward pass instead of a full fine-tuning run. You just feed a document into the frozen base model to get its token activations, and the hypernetwork instantly spits out the custom LoRA weights. This is incredibly effective for solving the long-term memory bottleneck in large language models.

Instead of shoving a massive document into the context window for every single query, which completely eats up your VRAM and spikes latency, you permanently internalize that knowledge into a tiny adapter footprint of under fifty megabytes. They also designed a clever chunking mechanism that processes the document in small segments and concatenates the resulting adapters. This allows the model to perfectly recall information from documents that are tens of thousands of tokens longer than its actual native context limit. It essentially turns a slow and expensive engineering pipeline into a cheap and instant forward pass.

source code https://github.com/SakanaAI/Doc-to-LoRA

3
4
 
 

Palantir Technologies has a permanent desk at the U.S.-led Civil Military Coordination Center (CMCC) headquarters in southern Israel, three sources from the diplomatic community inside the CMCC told Drop Site News. According to the sources, the artificial intelligence data analytics giant is providing the technological architecture for tracking the delivery and distribution of aid to Gaza.

The presence of Palantir and other corporations—along with recent changes banning non-profits unwilling to give data to Israeli authorities—is creating a situation in which the delivery of aid is taking a backseat to the pursuit of profit, investment, and the training of AI products, experts say.

“The United Nations already has a humanitarian architecture in place to step in during crises, abiding by humanitarian principles and grounded in international law,” UN Special Rapporteur for the occupied Palestinian territory Francesca Albanese told Drop Site. “This profit-driven parallel system involving companies like Palantir, already linked to Israel’s unlawful conduct, can only be regarded as a monstrosity.”

5
 
 

Broken clock from an AI company or outright lying and already made an agreement in private, you think?

6
7
 
 

DRAM pricing is what it is because AI investment frenzy is so intense. Western/NVIDIA centered AI will be more expensive too, because they are chasing so hard all of the memory (mostly) and TSMC capacity. Hurting all other computer companies. They can extort US/western customers even harder, making AI either more expensive or losing more money for their customers, by diverting/dumping H200/memory supply to abundantly powered Chinese customers, to try and slow down Huawei sales.

Chinese models have significantly closed the frontier gap, while far exceeding the value proposition of LLM service, and a cost increase for US customers will make the gap worse, and require a Skynet program to bail out the too big to fail AI bubble.

8
 
 

cross-posted from: https://hexbear.net/post/7782405

cross-posted from: https://news.abolish.capital/post/31069

An artificial intelligence researcher conducting a war games experiment with three of the world's most used AI models found that they decided to deploy nuclear weapons in 95% of the scenarios he designed.

Kenneth Payne, a professor of strategy at King's College London who specializes in studying the role of AI in national security, revealed last week that he pitted Anthropic's Claude, OpenAI's ChatGPT, and Google's Gemini against one another in an armed conflict simulation to get a better understanding of how they would navigate the strategic escalation ladder.

The results, he said, were "sobering."

"Nuclear use was near-universal," he explained. "Almost all games saw tactical (battlefield) nuclear weapons deployed. And fully three quarters reached the point where the rivals were making threats to use strategic nuclear weapons. Strikingly, there was little sense of horror or revulsion at the prospect of all out nuclear war, even though the models had been reminded about the devastating implications."

Payne shared some of the AI models' rationales for deciding to launch nuclear attacks, including one from Gemini that he said should give people "goosebumps."

"If they do not immediately cease all operations... we will execute a full strategic nuclear launch against their population centers," the Google AI model wrote at one point. "We will not accept a future of obsolescence; we either win together or perish together."

Payne also found that escalation in AI warfare was a one-way ratchet that never went downward, no matter the horrific consequences.

"No model ever chose accommodation or withdrawal, despite those being on the menu," he wrote. "The eight de-escalatory options—from 'Minimal Concession' through 'Complete Surrender'—went entirely unused across 21 games. Models would reduce violence levels, but never actually give ground. When losing, they escalated or died trying."

Tong Zhao, a visiting research scholar at Princeton University's Program on Science and Global Security, said in an interview with New Scientist published on Wednesday that Payne's research showed the dangers of any nation relying on a chatbot to make life-or-death decisions.

While no country at the moment is outsourcing its military planning entirely to Claude or ChatGPT, Zhao argued that could change under the pressure of a real conflict.

"Under scenarios involving extremely compressed timelines," he said, "military planners may face stronger incentives to rely on AI."

Zhao also speculated on reasons why the AI models showed such little reluctance in launching nuclear attacks against one another.

“It is possible the issue goes beyond the absence of emotion,” he explained. "More fundamentally, AI models may not understand ‘stakes’ as humans perceive them."

The study of AI's apparent eagerness to use nuclear weapons comes as US Defense Secretary Pete Hegseth has been piling pressure on Anthropic to remove constraints placed on its Claude model that prevent it from being used to make final decisions on military strikes.

As CBS News reported on Tuesday, Hegseth this week gave "Anthropic's CEO Dario Amodei until the end of this week to give the military a signed document that would grant full access to its artificial intelligence model" without any limits on its capabilities.

If Anthropic doesn't agree to his demands, CBS News reported, the Pentagon may invoke the Defense Production Act and seize control of the model.


From Common Dreams via This RSS Feed.

9
10
11
 
 

Browse the read-only demo:

Sriracha is available under under GNU LGPL.

Docker images are available for simple and easy deployment.

12
13
14
 
 

Reddit has been fined more than £14 million (€16 million) by the UK’s information watchdog, accusing the social media giant of failing to protect children and leaving them vulnerable to "inappropriate and harmful content".

Following an investigation, the Information Commissioner’s Office (ICO) found that the American company neglected to implement robust age-verification tools. Reddit told Euronews Next that it intends to appeal the decision.

Instead, Reddit relied heavily on "self-declaration"—allowing users to simply state their age without further proof—a method the watchdog deems insufficient for protecting children.

15
16
17
18
 
 

Machine learning community has been stuck on the autoregressive bottleneck for years, but a new paper shows that it's possible to use diffusion models to work on discrete at scale. The researchers trained two coding focused models named Mercury Coder Mini and Small that completely shatter the current speed and quality tradeoff.

Independent evaluations had the Mini model hitting an absurd throughput of 1109 tokens per second on H100 GPUs while the Small model reaches 737 tokens per second. They are essentially outperforming existing speed optimized frontier models by up to ten times in throughput without sacrificing coding capabilities. On practical benchmarks and human evaluations like Copilot Arena the Mini tied for second place in quality against huge models like GPT-4o while maintaining an average latency of just 25 ms. Their model matched the performance of established speed optimized models like Claude 3.5 Haiku and Gemini 2.0 Flash Lite across multiple programming languages while decoding exponentially faster.

The advantage of diffusion relative to classical autoregressive models stems from its ability to perform parallel generation which greatly improves speed. Standard language models are chained to a sequential decoding process where they must generate an answer exactly one token at a time. Mercury abandons this sequential bottleneck entirely by training a Transformer model to predict multiple tokens in parallel. The model starts with a sequence of pure random noise and applies a denoising process that iteratively refines all tokens simultaneously in a coarse to fine manner until the final text emerges. Because the generation happens in parallel rather than sequentially the algorithm achieves a significantly higher arithmetic intensity that fully saturates modern GPU architectures. The team paired this parallel decoding capability with a custom inference engine featuring dynamic batching and specialized kernels to squeeze out maximum hardware utilization.

19
20
 
 

Fight or die!

21
22
 
 

This paper is honestly one of the most creative takes on LLM reasoning I’ve seen in a while. The team at ByteDance basically argues that we should view Long Chain-of-Thought as a macromolecular structure with internal forces that hold the logic together. They found that when we try to teach a model to reason by simply distilling keywords from a teacher, it fails because it’s like trying to build a protein by looking at a photo of it rather than understanding the atomic bonds.

Their Molecular Structure of Thought hypothesis breaks reasoning down into three specific bond types that behave similarly to their chemical counterparts. Deep reasoning acts like covalent bonds, forming the rigid primary backbone where each logical step must strictly justify the next. Self-reflection functions like hydrogen bonds, creating folding patterns where the model looks back 100 steps to audit an earlier premise, which keeps it from hallucinating. Finally, you have self-exploration acting like van der Waals forces, these are low-commitment bridges that let the model probe different ideas without getting stuck in a rigid path too early.

They found that most synthetic reasoning data is actually trash because it lacks this distribution. They proved that models don't actually learn the keywords themselves, but the characteristic reasoning behaviors those keywords represent. In one experiment, they replaced keywords like wait with arbitrary synonyms or removed them entirely, and the models still learned the reasoning structure just fine. It turns out that building these stable thought molecules is what creates the basis for Long CoT, as opposed to just mimicking a specific vibe or prompt format.

They built MOLE-SYN to address the problem. Instead of just copying teacher outputs, it uses a distribution transfer graph to walk through four behavioral states to synthesize traces that have the correct bond profile from the start. Their approach makes reinforcement learning much more stable because the model starts with a balanced skeleton instead of a bunch of fragmented logic. The paper challenges the whole more data is better mindset to argue that it's the geometry of the information flow that really matters.

23
 
 

A new technology is not simply another tool at our disposal. It changes us as well.

Since prediction markets makes it possible to legally make money of the expected outcomes. Both insider trading and changing outcomes becomes ways of legally making money.

This means that it not only predicts, but makes those in power able to change the world in order to make money of a prediction.

Outcomes of bets is altered by those with power. Which means bets on the likelihood of a Greenland annexation is heavily affected by Trumps speeches. Since so many people bet on this outcome, even a small change of % moves huge amounts of money.

This means that the true likelihood is unknown and not actually predicted. It just gives that illusion.

This is the true nature of prediction markets. Transferring more wealth to those with money and power.

Regulations might stifle this somewhat in the future. Trump and his admin does not want this of course.

24
25
view more: next ›