Technology

251

5

WebUSB Unpinner: network analysis for the masses (reversing.works)

submitted 3 months ago by chobeat@lemmy.ml to c/technology@lemmygrad.ml

0 comments fedilink

252

14

Driving China’s Cheapest EV ($4,800) in the U.S. (youtu.be)

submitted 3 months ago* (last edited 3 months ago) by cfgaussian@lemmygrad.ml to c/technology@lemmygrad.ml

1 comments fedilink

This will never be available to buy in the US, and if it was it would be three times the price after tariffs and importers take their cut. There is a company in Germany that is preparing to import these (or another very similar super-cheap EV from China) but they will be absurdly inflated in price: they will buy them for €3,000 at the factory in China and sell them for €14,000 on the European market. You can't have cheap EVs in Europe or the US, else the auto industry will crash (faster than it already is).

253

12

Nation's 1st dedicated optical quantum computer factory begins production in Shenzhen_Industrial News-Shenzhen Government Online (www.sz.gov.cn)

submitted 3 months ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

0 comments fedilink

254

18

CXMT Debuts Its Domestically-Produced DDR5 Memory: 8000 MT/s DDR5 & 10667 MT/s LP5X For Servers, Desktops, & Laptops (wccftech.com)

submitted 3 months ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

0 comments fedilink

255

48

China leads research in 90% of crucial technologies — a dramatic shift this century (www.nature.com)

submitted 3 months ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

3 comments fedilink

256

44

China’s ‘artificial sun’ breaks nuclear fusion limit thought to be impossible (www.independent.co.uk)

submitted 3 months ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

2 comments fedilink

257

45

China is building one of the most advanced fusion reactors on Earth—a facility that could generate nearly limitless clean energy. (www.msn.com)

submitted 3 months ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

9 comments fedilink

258

16

The great chip leap: China’s semiconductor equipment self-reliance beats targets (www.scmp.com)

submitted 3 months ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

0 comments fedilink

https://archive.vn/amvw3

259

15

DeepSeek to launch new AI model focused on coding in February, The Information reports (www.reuters.com)

submitted 3 months ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

1 comments fedilink

260

9

8 plots that explain the state of open models (www.interconnects.ai)

submitted 3 months ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

2 comments fedilink

261

6

Distinct AI Models Seem To Converge On How They Encode Reality (www.quantamagazine.org)

submitted 3 months ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

0 comments fedilink

262

18

DeepSeek paper on their new Manifold-Constrained Hyper-Connections (mHC) method could fundamentally change how AI models are trained and scaled—with implications for the entire industry. (arxiv.org)

submitted 3 months ago* (last edited 3 months ago) by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

2 comments fedilink

DeepSeek team just published a paper on Manifold-Constrained Hyper-Connections. It addresses a pretty specific bottleneck we are seeing with recent attempts to scale residual streams.

The core issue they are tackling is that while widening the residual stream (Hyper-Connections or HC) gives you better performance by adding more information capacity, it usually breaks the identity mapping property that makes ResNets and Transformers trainable in the first place. When you just let those connection matrices learn freely, your signal magnitudes go haywire during deep network training which leads to exploding gradients.

Their solution is actually quite elegant. They force the learnable matrices to live on a specific manifold, specifically the Birkhoff polytope. Practically, this means they use the Sinkhorn-Knopp algorithm to ensure the connection matrices are "doubly stochastic," meaning all rows and columns sum to 1. This is clever because it turns the signal propagation into a weighted average rather than an unbounded linear transformation. That preserves the signal mean and keeps the gradient norms stable even in very deep networks.

What I found most interesting though was the engineering side. Usually, these multi-stream ideas die because of memory bandwidth rather than FLOPs. Expanding the width by times typically creates a massive I/O bottleneck. They managed to get around this with some heavy kernel fusion and a modified pipeline schedule they call DualPipe to overlap communication.

The results look solid. They trained a 27B model and showed that mHC matches the stability of standard baselines while keeping the performance gains of the wider connections. It only added about 6.7% time overhead compared to a standard baseline, which is a decent trade-off for the gains they are seeing in reasoning tasks like GSM8K and math. It basically makes the "wider residual stream" idea practical for actual large-scale pre-training.

Expanding the residual stream adds more pathways for information to flow which helps with training on constrained hardware by decoupling the model's capacity from its computational cost. Usually if you want a model to be "smarter" or maintain more state depth, you have to increase the hidden dimension size which makes your Attention and Feed-Forward layers quadratically more expensive to run. The mHC approach lets you widen that information highway without touching the expensive compute layers. The extra connections they add are just simple linear mappings which are computationally negligible compared to the heavy matrix multiplications in the rest of the network.

They further combined this technique with a Mixture-of-Experts (MoE) architecture, which is the component that actually reduces the number of active parameters during any single forward pass. The mHC method ensures that even with that sparsity, the signal remains stable and creates a mathematically sound path for gradients to flow without exploding VRAM usage. The intermediate states of those extra streams are now discarded during training and get computed on the fly during the backward pass. This allows you to train a model that behaves like a much larger dense network while fitting into the memory constraints of cheaper hardware clusters.

263

45

China hacked email systems of US congressional committee staff (www.ft.com)

submitted 3 months ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

11 comments fedilink

https://archive.ph/wxcBV

264

14

DeepSeek $1.6B GPU Gamble: The End of Sovereign AI (trendytechtribe.com)

submitted 3 months ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

0 comments fedilink

265

18

China's method triples olefin output from coal while slashing CO2 release (interestingengineering.com)

submitted 3 months ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

0 comments fedilink

266

13

A 30B Qwen Model Walks Into a Raspberry Pi… and Runs in Real Time (byteshape.com)

submitted 3 months ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

2 comments fedilink

Qwen3-30B-A3B-Instruct-2507 device-optimized quant variants without output quality falling off a cliff.

A 30B runs on a Raspberry Pi 5 (16GB) achieving 8.03 TPS at 2.70 BPW, while retaining 94.18% of BF16 quality. ShapeLearn tends to find better TPS/quality tradeoffs versus alternatives.

What’s new/interesting in this one

CPU behavior is mostly sane

On CPUs, once you’re past “it fits,” smaller tends to be faster in a fairly monotonic way. The tradeoff curve behaves like you’d expect.

GPU behavior is quirky

On GPUs, performance depends as much on kernel choice as on memory footprint. So you often get sweet spots (especially around ~4b) where the kernels are “golden path,” and pushing lower-bit can get weird.

models: https://huggingface.co/byteshape/Qwen3-30B-A3B-Instruct-2507-GGUF

267

4

Microsoft just open-sourced bitnet.cpp, a 1-bit LLM inference framework. It let's you run 100B parameter models on your local CPU without GPUs. 6.17x faster inference and 82.2% less energy on CPUs. (github.com)

submitted 3 months ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

0 comments fedilink

268

12

Post-American internet by Cory Doctorow - YouTube - Nimrod Kamer (www.youtube.com)

submitted 3 months ago by chobeat@lemmy.ml to c/technology@lemmygrad.ml

4 comments fedilink

269

29

Gadgets For People Who Don't Trust The Government (www.youtube.com)

submitted 4 months ago* (last edited 4 months ago) by Comprehensive49@lemmygrad.ml to c/technology@lemmygrad.ml

3 comments fedilink

Good tips on tech for decentralized communication and stopping surveillance. The Youtuber is kinda a nebulous hate-all-states anarchist though.

270

12

Recursive Language Models (arxiv.org)

submitted 4 months ago* (last edited 4 months ago) by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

5 comments fedilink

This paper basically shows that treating the prompt as an external variable is a surprisingly effective way to handle massive contexts. The authors argue that instead of shoving ten million tokens directly into the model and hoping for the best, we should put the text into a Python REPL environment where the model can interact with it programmatically. This setup allows the LLM to write code that slices the text into manageable chunks and recursively calls new instances of itself to process those pieces individually. It is essentially the same logic as out-of-core algorithms which process datasets far larger than the available memory by fetching only what is needed at any given moment.

One of the most interesting parts of the study is how it exposes the reality of context rot in frontier models like GPT-5. The results show that while base models handle simple needle-in-a-haystack tasks just fine, they fall apart completely on information dense tasks that require aggregating data across the entire input. For example, on the OOLONG-Pairs benchmark which has quadratic complexity, the base GPT-5 model scores less than 0.1 percent accuracy once the context gets long enough. Meanwhile, the recursive language model manages to hold steady even up to a million tokens and achieves a 58% score on that same difficult task.

Turns out that for retrieval tasks like CodeQA, simply having the REPL to grep through files was enough to beat the base model because the model could filter data before reading it. Having the recursive capability turned out to be essential for reasoning tasks like OOLONG where the model needs to process every line. The version of the system that could not make subcalls performed significantly worse because it could not offload the thinking process to fresh contexts and prevent its own window from getting polluted.

Since the model writes code to filter the text using tools like regex before it actually reads anything, it processes fewer tokens on average than a summary agent that is forced to read everything to compress it. The only downside is that the variance can be pretty wild since the model sometimes gets stuck in a loop or decides to verify its own answer multiple times in a row which blows up the compute cost for that specific run.

We are clearly seeing a shift where inference time compute and smart context management are becoming more important than just having a massive raw context window. The fact that this method beats retrieval-based agents on deep research tasks suggests that giving the model a loop to think and code is the future for tasks that need a large persistent context.

271

6

C-LoRA: Continual Low-Rank Adaptation for Pre-trained Models (arxiv.org)

submitted 4 months ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

0 comments fedilink

272

4

The Manifold Dial: Visualizing Why DeepSeek's mHC Stabilizes Deep Networks (subhadipmitra.com)

submitted 4 months ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

0 comments fedilink

https://arxiv.org/abs/2512.24880

273