technology

24347 readers
332 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 5 years ago
MODERATORS
26
 
 

A GGUF port of DFlash speculative decoding. Standalone C++/CUDA stack on top of ggml, runs on a single 24 GB RTX 3090, hosts the new Qwen3.6-27B.

~1.98x mean over autoregressive on Qwen3.6 across HumanEval / GSM8K / Math500, with zero retraining.

If you have CUDA 12+ and an NVIDIA GPU like RTX 3090 / 4090 / 5090, then all you need to do is

clone the repo

cd lucebox-hub/dflash
cmake -B build -S . -DCMAKE_BUILD_TYPE=Release
cmake --build build --target test_dflash -j

fetch target (~16 GB)

hf download unsloth/Qwen3.6-27B-GGUF Qwen3.6-27B-Q4_K_M.gguf --local-dir models/

matched 3.6 draft is gated: accept terms + set HF_TOKEN first

hf download z-lab/Qwen3.6-27B-DFlash --local-dir models/draft/

run

DFLASH_TARGET=models/Qwen3.6-27B-Q4_K_M.gguf python3 scripts/run.py --prompt "def fibonacci(n):"

That's it. No Python runtime in the engine, no llama.cpp install, no vLLM, no SGLang.

Luce DFlash will:

  1. Load Qwen3.6-27B Q4_K_M target weights (~16 GB) plus the matched DFlash bf16 draft (~3.46 GB) and run DDTree tree-verify speculative decoding (block size 16, default budget 22, greedy verify).
  2. Compress the KV cache to TQ3_0 (3.5 bpv, ~9.7x vs F16) and roll a 4096-slot target_feat ring so 256K context fits in 24 GB. Q4_0 is the legacy path and tops out near 128K.
  3. Auto-bump the prefill ubatch from 16 to 192 for prompts past 2048 tokens (~913 tok/s prefill on 13K prompts).
  4. Apply sliding-window flash attention at decode (default 2048-token window, 100% speculative acceptance retained) so 60K context still decodes at 89.7 tok/s instead of 25.8 tok/s.
  5. Serve over an OpenAI-compatible HTTP endpoint or a local chat REPL.

Running on RTX 3090, Qwen3.6-27B UD-Q4_K_XL (unsloth Dynamic 2.0) target, 10 prompts/dataset, n_gen=256:

Bench AR tok/s DFlash tok/s AL Speedup

HumanEval 34.90 78.16 5.94 2.24x

Math500 35.13 69.77 5.15 1.99x

GSM8K 34.89 59.65 4.43 1.71x

Mean 34.97 69.19 5.17 1.98x

27
 
 

*The Global oil crisis has changed fossil fuel industry for ever say IEA chief *

The IEA chief says the global oil crisis has permanently reshaped the fossil fuel industry, pushing countries to accelerate the shift toward cleaner energy and electrification. This turning point could reduce long-term dependence on oil and speed up the global energy transition.

28
29
30
 
 

cross-posted from: https://hexbear.net/post/8357952

Attention Hardware Founders: While you are chasing 50% margins, a small city in China is dominating the globe on less than 1 cent of profit per product. It’s not cheap labor; it’s the ruthless execution of "Extreme Micro-Optimization."

31
 
 

Whatever happened to the Tesla Semi? And does its slow rollout mean electric trucks aren’t viable? According to the International Energy Agency’s Global EV Outlook 2025, the answer may be very different from what you might think. Because while much of the Western conversation is focused on delays and limited deployments, China is already electrifying heavy freight at scale, with electric trucks reaching 50% of new sales and beginning to challenge assumptions about diesel dominance. So, what could it all mean for the future of oil demand and the energy transition?

32
33
 
 

Chinese EVs are truly a canary in the coal mine, but is it just domination in sales? Or will the effect that Chinese cars have on the market change all other cars too?

34
 
 

In this episode of the Everything Electric Podcast, Robert Llewellyn sits down with Professor Jan Rosenow, Professor of Energy and Climate Policy at Oxford University, to reveal why electricity currently only tells 20% of the global energy story.

They delve into tackling the "hidden 80%", the mobility and heating sectors still dominated by fossil fuels; and explore why our current system is "astonishingly inefficient," wasting two-thirds of all energy inputs as heat. Jan explains how shifting to electrification at scale could cut total global energy demand in half and tackles the biggest myths and milestones of the transition:

  • The Grid Threat: Why data centers pose a more significant regional challenge to the grid than 100 million electric vehicles.

  • Critical Materials: Is the world really running out of lithium, or are we entering an era of "urban mining" where 95-97% of battery materials can be recycled?

  • The China Factor: A look at the "mind-blowing" scale of solar adoption in China and the declining utilization of their coal plants.

  • Beyond Climate: Why electrification is now a primary lever for energy security and economic resilience in a volatile world.

35
36
37
38
 
 

Google is bad, but people have no idea how bad search could actually get. Trying to find DPRK housing stats and DDG will only give me pages about recent canadian housing news.

39
40
41
42
43
 
 

There is no longer any CUDA dependency anywhere in its stack, which is probably the biggest deal of all. For those who don't know, CUDA is Nvidia's software layer which is the foundation nearly every frontier AI model in the world is built on. Except, as of today, DeepSeek V4, which can run entirely on Huawei Ascend chips via Huawei's CANN framework. China now has its own domestic AI stack, top to bottom.

44
45
 
 

cross-posted from: https://news.abolish.capital/post/44825

military contractor Palantir is helping the IRS analyze dozens of different data sets on Americans to investigate a broad range of financial crimes, according to records shared with The Intercept.

Since 2018, the Internal Revenue Service’s Criminal Investigation division has used Palantir’s Lead and Case Analytics platform to aggregate and analyze a sprawling list of sensitive federal databases and data sets.

Public records detailing Palantir’s IRS contract, obtained by the nonprofit watchdog group American Oversight and shared exclusively with The Intercept, reveal the immense volume of data plugged into the military contractor’s software. The LCA uses both Palantir’s Gotham and Foundry applications to facilitate “analysis of massive-scale data to find the needle in the hay stack,” the contract paperwork says.

Documents indicate the IRS has paid Palantir over $130 million for these services to date.

Palantir’s LCA is ostensibly directed toward cracking down on fraud, money laundering, and other financial crimes. According to a 2024 agency privacy impact assessment, IRS “Special agents and investigative analysts … utilize the platform to find, analyze, and visualize connections between disparate sets of data to generate leads, identify schemes, uncover tax fraud, and conduct money laundering and forfeiture investigative activities.”

[

Related

Trump Wants to Put You in a Massive, Secret Government Database](https://theintercept.com/2026/03/17/government-surveillance-centralized-database-privacy/)

The IRS use of the software, launched under Trump’s first term and expanded under Biden, is now in the hands of an IRS Criminal Investigations office that has drastically scaled back its pursuit of tax cheats and pivoted, under Trump’s direction, toward investigating “left-leaning groups,” the Wall Street Journal reported in October.

“The real concern is the consolidation of vast amounts of sensitive personal data into a single system with minimal transparency — especially one built and operated by a contractor like Palantir, whose business model is premised on integrating data and expanding surveillance capabilities,” American Oversight director Chioma Chukwu said in a statement to The Intercept. “Its platforms have been used in deeply troubling contexts, from immigration enforcement to predictive policing, with persistent concerns about overreach, bias, and weak oversight.”

Palantir did not respond to a request for comment, nor did the IRS.

“The real concern is the consolidation of vast amounts of sensitive personal data into a single system with minimal transparency — especially one built and operated by a contractor like Palantir.”

The contract documents reviewed by The Intercept reveal that these “disparate sets of data” are vast. Palantir’s LCA allows the IRS to quickly search and visualize “connections from millions of records with thousands of links” between databases maintained by the IRS and other federal agencies. According to the contract documents, this data includes individual tax form and tax returns as well as Affordable Care Act data, bank statements, and transactions, and “all available” data compiled by the Treasury Department’s Financial Crimes Enforcement Network.

Its view apparently extends to cryptocurrencies including bitcoin, Ethereum, Litecoin, and Ripple. “The application would sit on top of a singular repository of identified wallets from seized servers utilizing dark web data obtained from exchangers such as Coinbase,” the documents note.

The program places an emphasis on mapping social relationships between the targets of an investigation. That includes analyzing a “network of people and the relationships and communications between them,” such as “calls, texts, [and] emails events.” The use of “IP address analysis” within LCA allows the IRS to “Identify suspects more easily” and “Establish (new) relationships among actors.”

These investigative functions are continuously updated, the materials say, through ongoing close work between Palantir engineers and IRS personnel.

[

Related

Palantir Will No Longer Profit Off of New Yorkers’ Health Data](https://theintercept.com/2026/03/24/palantir-new-york-city-hospitals-contract/)

The intermingling of sensitive data on millions of Americans comes at a time of increased global skepticism and opposition toward Palantir, which, despite its military-intelligence origins, has a thriving business with civilian agencies like the IRS. The use of Palantir software at the U.K.’s National Health Service, for example, has created an ongoing political controversy across Britain, while a similar contract with the New York City public hospital network was recently canceled following public protest.

The contract is also active at a time when IRS Criminal Investigations has been coopted to aid in the broader Trump administration’s aggressive agenda. In July, ProPublica reported that the agency was working with U.S. Immigration and Customs Enforcement to provide “on demand” data to accelerate deportations. Last year, the New York Times reported that Palantir, founded by Trump ally Peter Thiel, was central to an administration effort to increase data-sharing across federal agencies.

“The question isn’t just what it can do — it’s who it will be used against.”

The company’s right-wing politics and eagerness to facilitate U.S. and Israeli military aggression abroad, NSA global surveillance, and ICE deportations has also made many weary of its access to incredibly sensitive personal data. A recent post on the company’s Palantir’s X account summarizing a book by CEO Alex Karp triggered an immediate backlash from those unnerved by the manifesto’s fascistic bent. The bullet points extolled the virtue of arms manufacturing, argued the Axis powers were unfairly punished after World War II, called for a reinstatement of the draft, condemned cultural pluralism, and claimed that wealthy elites are unfairly persecuted.

“When the government can map relationships, track behavior, and generate investigative leads across data sets at this scale, the question isn’t just what it can do — it’s who it will be used against,” Chukwu said. “Entrusting that infrastructure to a company known for opaque, security-state deployments only heightens those risks.”

The post Palantir Is Helping Trump’s IRS Conduct “Massive-Scale” Data Mining appeared first on The Intercept.


From The Intercept via This RSS Feed.

46
47
48
49
 
 

cross-posted from: https://lemmygrad.ml/post/11418648

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

Technical Report 👁️

Introduction

We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens.

DeepSeek-V4 series incorporate several key upgrades in architecture and optimization:

  1. Hybrid Attention Architecture: We design a hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to dramatically improve long-context efficiency. In the 1M-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2.
  2. Manifold-Constrained Hyper-Connections (mHC): We incorporate mHC to strengthen conventional residual connections, enhancing stability of signal propagation across layers while preserving model expressivity.
  3. Muon Optimizer: We employ the Muon optimizer for faster convergence and greater training stability.

We pre-train both models on more than 32T diverse and high-quality tokens, followed by a comprehensive post-training pipeline. The post-training features a two-stage paradigm: independent cultivation of domain-specific experts (through SFT and RL with GRPO), followed by unified model consolidation via on-policy distillation, integrating distinct proficiencies across diverse domains into a single model.

DeepSeek-V4-Pro-Max, the maximum reasoning effort mode of DeepSeek-V4-Pro, significantly advances the knowledge capabilities of open-source models, firmly establishing itself as the best open-source model available today. It achieves top-tier performance in coding benchmarks and significantly bridges the gap with leading closed-source models on reasoning and agentic tasks. Meanwhile, DeepSeek-V4-Flash-Max achieves comparable reasoning performance to the Pro version when given a larger thinking budget, though its smaller parameter scale naturally places it slightly behind on pure knowledge tasks and the most complex agentic workflows.

Model Downloads

Model #Total Params #Activated Params Context Length Precision Download
DeepSeek-V4-Flash-Base 284B 13B 1M FP8 Mixed HuggingFace | ModelScope
DeepSeek-V4-Flash 284B 13B 1M FP4 + FP8 Mixed* HuggingFace | ModelScope
DeepSeek-V4-Pro-Base 1.6T 49B 1M FP8 Mixed HuggingFace | ModelScope
DeepSeek-V4-Pro 1.6T 49B 1M FP4 + FP8 Mixed* HuggingFace | ModelScope

*FP4 + FP8 Mixed: MoE expert parameters use FP4 precision; most other parameters use FP8.

Evaluation Results

Base Model

Benchmark (Metric) # Shots DeepSeek-V3.2-Base DeepSeek-V4-Flash-Base DeepSeek-V4-Pro-Base
Architecture - MoE MoE MoE
# Activated Params - 37B 13B 49B
# Total Params - 671B 284B 1.6T
World Knowledge
AGIEval (EM) 0-shot 80.1 82.6 83.1
MMLU (EM) 5-shot 87.8 88.7 90.1
MMLU-Redux (EM) 5-shot 87.5 89.4 90.8
MMLU-Pro (EM) 5-shot 65.5 68.3 73.5
MMMLU (EM) 5-shot 87.9 88.8 90.3
C-Eval (EM) 5-shot 90.4 92.1 93.1
CMMLU (EM) 5-shot 88.9 90.4 90.8
MultiLoKo (EM) 5-shot 38.7 42.2 51.1
Simple-QA verified (EM) 25-shot 28.3 30.1 55.2
SuperGPQA (EM) 5-shot 45.0 46.5 53.9
FACTS Parametric (EM) 25-shot 27.1 33.9 62.6
TriviaQA (EM) 5-shot 83.3 82.8 85.6
Language & Reasoning
BBH (EM) 3-shot 87.6 86.9 87.5
DROP (F1) 1-shot 88.2 88.6 88.7
HellaSwag (EM) 0-shot 86.4 85.7 88.0
WinoGrande (EM) 0-shot 78.9 79.5 81.5
CLUEWSC (EM) 5-shot 83.5 82.2 85.2
Code & Math
BigCodeBench (Pass@1) 3-shot 63.9 56.8 59.2
HumanEval (Pass@1) 0-shot 62.8 69.5 76.8
GSM8K (EM) 8-shot 91.1 90.8 92.6
MATH (EM) 4-shot 60.5 57.4 64.5
MGSM (EM) 8-shot 81.3 85.7 84.4
CMath (EM) 3-shot 92.6 93.6 90.9
Long Context
LongBench-V2 (EM) 1-shot 40.2 44.7 51.5

Instruct Model

DeepSeek-V4-Pro and DeepSeek-V4-Flash both support three reasoning effort modes:

Reasoning Mode Characteristics Typical Use Cases Response Format
Non-think Fast, intuitive responses Routine daily tasks, low-risk decisions </think> summary
Think High Conscious logical analysis, slower but more accurate Complex problem-solving, planning <think> thinking </think> summary
Think Max Push reasoning to its fullest extent Exploring the boundary of model reasoning capability Special system prompt + <think> thinking </think> summary

DeepSeek-V4-Pro-Max vs Frontier Models

Benchmark (Metric) Opus-4.6 Max GPT-5.4 xHigh Gemini-3.1-Pro High K2.6 Thinking GLM-5.1 Thinking DS-V4-Pro Max
Knowledge & Reasoning
MMLU-Pro (EM) 89.1 87.5 91.0 87.1 86.0 87.5
SimpleQA-Verified (Pass@1) 46.2 45.3 75.6 36.9 38.1 57.9
Chinese-SimpleQA (Pass@1) 76.4 76.8 85.9 75.9 75.0 84.4
GPQA Diamond (Pass@1) 91.3 93.0 94.3 90.5 86.2 90.1
HLE (Pass@1) 40.0 39.8 44.4 36.4 34.7 37.7
LiveCodeBench (Pass@1) 88.8 - 91.7 89.6 - 93.5
Codeforces (Rating) - 3168 3052 - - 3206
HMMT 2026 Feb (Pass@1) 96.2 97.7 94.7 92.7 89.4 95.2
IMOAnswerBench (Pass@1) 75.3 91.4 81.0 86.0 83.8 89.8
Apex (Pass@1) 34.5 54.1 60.9 24.0 11.5 38.3
Apex Shortlist (Pass@1) 85.9 78.1 89.1 75.5 72.4 90.2
Long Context
MRCR 1M (MMR) 92.9 - 76.3 - - 83.5
CorpusQA 1M (ACC) 71.7 - 53.8 - - 62.0
Agentic
Terminal Bench 2.0 (Acc) 65.4 75.1 68.5 66.7 63.5 67.9
SWE Verified (Resolved) 80.8 - 80.6 80.2 - 80.6
SWE Pro (Resolved) 57.3 57.7 54.2 58.6 58.4 55.4
SWE Multilingual (Resolved) 77.5 - - 76.7 73.3 76.2
BrowseComp (Pass@1) 83.7 82.7 85.9 83.2 79.3 83.4
HLE w/ tools (Pass@1) 53.1 52.0 51.6 54.0 50.4 48.2
GDPval-AA (Elo) 1619 1674 1314 1482 1535 1554
MCPAtlas Public (Pass@1) 73.8 67.2 69.2 66.6 71.8 73.6
Toolathlon (Pass@1) 47.2 54.6 48.8 50.0 40.7 51.8

Comparison across Modes

Benchmark (Metric) V4-Flash Non-Think V4-Flash High V4-Flash Max V4-Pro Non-Think V4-Pro High V4-Pro Max
Knowledge & Reasoning
MMLU-Pro (EM) 83.0 86.4 86.2 82.9 87.1 87.5
SimpleQA-Verified (Pass@1) 23.1 28.9 34.1 45.0 46.2 57.9
Chinese-SimpleQA (Pass@1) 71.5 73.2 78.9 75.8 77.7 84.4
GPQA Diamond (Pass@1) 71.2 87.4 88.1 72.9 89.1 90.1
HLE (Pass@1) 8.1 29.4 34.8 7.7 34.5 37.7
LiveCodeBench (Pass@1) 55.2 88.4 91.6 56.8 89.8 93.5
Codeforces (Rating) - 2816 3052 - 2919 3206
HMMT 2026 Feb (Pass@1) 40.8 91.9 94.8 31.7 94.0 95.2
IMOAnswerBench (Pass@1) 41.9 85.1 88.4 35.3 88.0 89.8
Apex (Pass@1) 1.0 19.1 33.0 0.4 27.4 38.3
Apex Shortlist (Pass@1) 9.3 72.1 85.7 9.2 85.5 90.2
Long Context
MRCR 1M (MMR) 37.5 76.9 78.7 44.7 83.3 83.5
CorpusQA 1M (ACC) 15.5 59.3 60.5 35.6 56.5 62.0
Agentic
Terminal Bench 2.0 (Acc) 49.1 56.6 56.9 59.1 63.3 67.9
SWE Verified (Resolved) 73.7 78.6 79.0 73.6 79.4 80.6
SWE Pro (Resolved) 49.1 52.3 52.6 52.1 54.4 55.4
SWE Multilingual (Resolved) 69.7 70.2 73.3 69.8 74.1 76.2
BrowseComp (Pass@1) - 53.5 73.2 - 80.4 83.4
HLE w/ tools (Pass@1) - 40.3 45.1 - 44.7 48.2
MCPAtlas (Pass@1) 64.0 67.4 69.0 69.4 74.2 73.6
GDPval-AA (Elo) - - 1395 - - 1554
Toolathlon (Pass@1) 40.7 43.5 47.8 46.3 49.0 51.8

Chat Template

This release does not include a Jinja-format chat template. Instead, we provide a dedicated encoding folder with Python scripts and test cases demonstrating how to encode messages in OpenAI-compatible format into input strings for the model, and how to parse the model's text output. Please refer to the encoding folder for full documentation.

A brief example:

from encoding_dsv4 import encode_messages, parse_message_from_completion_text

messages = [
    {"role": "user", "content": "hello"},
    {"role": "assistant", "content": "Hello! I am DeepSeek.", "reasoning_content": "thinking..."},
    {"role": "user", "content": "1+1=?"}
]

# messages -> string
prompt = encode_messages(messages, thinking_mode="thinking")

# string -> tokens
import transformers
tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V4-Pro")
tokens = tokenizer.encode(prompt)

How to Run Locally

Please refer to the inference folder for detailed instructions on running DeepSeek-V4 locally, including model weight conversion and interactive chat demos.

For local deployment, we recommend setting the sampling parameters to temperature = 1.0, top_p = 1.0. For the Think Max reasoning mode, we recommend setting the context window to at least 384K tokens.

License

This repository and the model weights are licensed under the MIT License.

Citation

@misc{deepseekai2026deepseekv4,
      title={DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence},
      author={DeepSeek-AI},
      year={2026},
}

Contact

If you have any questions, please raise an issue or contact us at service@deepseek.com.

50
 
 

cross-posted from: https://hexbear.net/post/8339171

Paul and Ken assess why China has taken a dominant position in global energy as has been highlighted during the ongoing Iran War. Two decades of planning and implementation has seen Beijing make astonishing strides forward in all aspects of energy including sectors which will probably surprise many.

view more: ‹ prev next ›