Technology

1420 readers
20 users here now

A tech news sub for communists

founded 3 years ago
MODERATORS
26
27
 
 

There is no longer any CUDA dependency anywhere in its stack, which is probably the biggest deal of all. For those who don't know, CUDA is Nvidia's software layer which is the foundation nearly every frontier AI model in the world is built on. Except, as of today, DeepSeek V4, which can run entirely on Huawei Ascend chips via Huawei's CANN framework. China now has its own domestic AI stack, top to bottom.

28
29
 
 

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

Technical Report 👁️

Introduction

We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens.

DeepSeek-V4 series incorporate several key upgrades in architecture and optimization:

  1. Hybrid Attention Architecture: We design a hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to dramatically improve long-context efficiency. In the 1M-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2.
  2. Manifold-Constrained Hyper-Connections (mHC): We incorporate mHC to strengthen conventional residual connections, enhancing stability of signal propagation across layers while preserving model expressivity.
  3. Muon Optimizer: We employ the Muon optimizer for faster convergence and greater training stability.

We pre-train both models on more than 32T diverse and high-quality tokens, followed by a comprehensive post-training pipeline. The post-training features a two-stage paradigm: independent cultivation of domain-specific experts (through SFT and RL with GRPO), followed by unified model consolidation via on-policy distillation, integrating distinct proficiencies across diverse domains into a single model.

DeepSeek-V4-Pro-Max, the maximum reasoning effort mode of DeepSeek-V4-Pro, significantly advances the knowledge capabilities of open-source models, firmly establishing itself as the best open-source model available today. It achieves top-tier performance in coding benchmarks and significantly bridges the gap with leading closed-source models on reasoning and agentic tasks. Meanwhile, DeepSeek-V4-Flash-Max achieves comparable reasoning performance to the Pro version when given a larger thinking budget, though its smaller parameter scale naturally places it slightly behind on pure knowledge tasks and the most complex agentic workflows.

Model Downloads

Model #Total Params #Activated Params Context Length Precision Download
DeepSeek-V4-Flash-Base 284B 13B 1M FP8 Mixed HuggingFace | ModelScope
DeepSeek-V4-Flash 284B 13B 1M FP4 + FP8 Mixed* HuggingFace | ModelScope
DeepSeek-V4-Pro-Base 1.6T 49B 1M FP8 Mixed HuggingFace | ModelScope
DeepSeek-V4-Pro 1.6T 49B 1M FP4 + FP8 Mixed* HuggingFace | ModelScope

*FP4 + FP8 Mixed: MoE expert parameters use FP4 precision; most other parameters use FP8.

Evaluation Results

Base Model

Benchmark (Metric) # Shots DeepSeek-V3.2-Base DeepSeek-V4-Flash-Base DeepSeek-V4-Pro-Base
Architecture - MoE MoE MoE
# Activated Params - 37B 13B 49B
# Total Params - 671B 284B 1.6T
World Knowledge
AGIEval (EM) 0-shot 80.1 82.6 83.1
MMLU (EM) 5-shot 87.8 88.7 90.1
MMLU-Redux (EM) 5-shot 87.5 89.4 90.8
MMLU-Pro (EM) 5-shot 65.5 68.3 73.5
MMMLU (EM) 5-shot 87.9 88.8 90.3
C-Eval (EM) 5-shot 90.4 92.1 93.1
CMMLU (EM) 5-shot 88.9 90.4 90.8
MultiLoKo (EM) 5-shot 38.7 42.2 51.1
Simple-QA verified (EM) 25-shot 28.3 30.1 55.2
SuperGPQA (EM) 5-shot 45.0 46.5 53.9
FACTS Parametric (EM) 25-shot 27.1 33.9 62.6
TriviaQA (EM) 5-shot 83.3 82.8 85.6
Language & Reasoning
BBH (EM) 3-shot 87.6 86.9 87.5
DROP (F1) 1-shot 88.2 88.6 88.7
HellaSwag (EM) 0-shot 86.4 85.7 88.0
WinoGrande (EM) 0-shot 78.9 79.5 81.5
CLUEWSC (EM) 5-shot 83.5 82.2 85.2
Code & Math
BigCodeBench (Pass@1) 3-shot 63.9 56.8 59.2
HumanEval (Pass@1) 0-shot 62.8 69.5 76.8
GSM8K (EM) 8-shot 91.1 90.8 92.6
MATH (EM) 4-shot 60.5 57.4 64.5
MGSM (EM) 8-shot 81.3 85.7 84.4
CMath (EM) 3-shot 92.6 93.6 90.9
Long Context
LongBench-V2 (EM) 1-shot 40.2 44.7 51.5

Instruct Model

DeepSeek-V4-Pro and DeepSeek-V4-Flash both support three reasoning effort modes:

Reasoning Mode Characteristics Typical Use Cases Response Format
Non-think Fast, intuitive responses Routine daily tasks, low-risk decisions </think> summary
Think High Conscious logical analysis, slower but more accurate Complex problem-solving, planning <think> thinking </think> summary
Think Max Push reasoning to its fullest extent Exploring the boundary of model reasoning capability Special system prompt + <think> thinking </think> summary

DeepSeek-V4-Pro-Max vs Frontier Models

Benchmark (Metric) Opus-4.6 Max GPT-5.4 xHigh Gemini-3.1-Pro High K2.6 Thinking GLM-5.1 Thinking DS-V4-Pro Max
Knowledge & Reasoning
MMLU-Pro (EM) 89.1 87.5 91.0 87.1 86.0 87.5
SimpleQA-Verified (Pass@1) 46.2 45.3 75.6 36.9 38.1 57.9
Chinese-SimpleQA (Pass@1) 76.4 76.8 85.9 75.9 75.0 84.4
GPQA Diamond (Pass@1) 91.3 93.0 94.3 90.5 86.2 90.1
HLE (Pass@1) 40.0 39.8 44.4 36.4 34.7 37.7
LiveCodeBench (Pass@1) 88.8 - 91.7 89.6 - 93.5
Codeforces (Rating) - 3168 3052 - - 3206
HMMT 2026 Feb (Pass@1) 96.2 97.7 94.7 92.7 89.4 95.2
IMOAnswerBench (Pass@1) 75.3 91.4 81.0 86.0 83.8 89.8
Apex (Pass@1) 34.5 54.1 60.9 24.0 11.5 38.3
Apex Shortlist (Pass@1) 85.9 78.1 89.1 75.5 72.4 90.2
Long Context
MRCR 1M (MMR) 92.9 - 76.3 - - 83.5
CorpusQA 1M (ACC) 71.7 - 53.8 - - 62.0
Agentic
Terminal Bench 2.0 (Acc) 65.4 75.1 68.5 66.7 63.5 67.9
SWE Verified (Resolved) 80.8 - 80.6 80.2 - 80.6
SWE Pro (Resolved) 57.3 57.7 54.2 58.6 58.4 55.4
SWE Multilingual (Resolved) 77.5 - - 76.7 73.3 76.2
BrowseComp (Pass@1) 83.7 82.7 85.9 83.2 79.3 83.4
HLE w/ tools (Pass@1) 53.1 52.0 51.6 54.0 50.4 48.2
GDPval-AA (Elo) 1619 1674 1314 1482 1535 1554
MCPAtlas Public (Pass@1) 73.8 67.2 69.2 66.6 71.8 73.6
Toolathlon (Pass@1) 47.2 54.6 48.8 50.0 40.7 51.8

Comparison across Modes

Benchmark (Metric) V4-Flash Non-Think V4-Flash High V4-Flash Max V4-Pro Non-Think V4-Pro High V4-Pro Max
Knowledge & Reasoning
MMLU-Pro (EM) 83.0 86.4 86.2 82.9 87.1 87.5
SimpleQA-Verified (Pass@1) 23.1 28.9 34.1 45.0 46.2 57.9
Chinese-SimpleQA (Pass@1) 71.5 73.2 78.9 75.8 77.7 84.4
GPQA Diamond (Pass@1) 71.2 87.4 88.1 72.9 89.1 90.1
HLE (Pass@1) 8.1 29.4 34.8 7.7 34.5 37.7
LiveCodeBench (Pass@1) 55.2 88.4 91.6 56.8 89.8 93.5
Codeforces (Rating) - 2816 3052 - 2919 3206
HMMT 2026 Feb (Pass@1) 40.8 91.9 94.8 31.7 94.0 95.2
IMOAnswerBench (Pass@1) 41.9 85.1 88.4 35.3 88.0 89.8
Apex (Pass@1) 1.0 19.1 33.0 0.4 27.4 38.3
Apex Shortlist (Pass@1) 9.3 72.1 85.7 9.2 85.5 90.2
Long Context
MRCR 1M (MMR) 37.5 76.9 78.7 44.7 83.3 83.5
CorpusQA 1M (ACC) 15.5 59.3 60.5 35.6 56.5 62.0
Agentic
Terminal Bench 2.0 (Acc) 49.1 56.6 56.9 59.1 63.3 67.9
SWE Verified (Resolved) 73.7 78.6 79.0 73.6 79.4 80.6
SWE Pro (Resolved) 49.1 52.3 52.6 52.1 54.4 55.4
SWE Multilingual (Resolved) 69.7 70.2 73.3 69.8 74.1 76.2
BrowseComp (Pass@1) - 53.5 73.2 - 80.4 83.4
HLE w/ tools (Pass@1) - 40.3 45.1 - 44.7 48.2
MCPAtlas (Pass@1) 64.0 67.4 69.0 69.4 74.2 73.6
GDPval-AA (Elo) - - 1395 - - 1554
Toolathlon (Pass@1) 40.7 43.5 47.8 46.3 49.0 51.8

Chat Template

This release does not include a Jinja-format chat template. Instead, we provide a dedicated encoding folder with Python scripts and test cases demonstrating how to encode messages in OpenAI-compatible format into input strings for the model, and how to parse the model's text output. Please refer to the encoding folder for full documentation.

A brief example:

from encoding_dsv4 import encode_messages, parse_message_from_completion_text

messages = [
    {"role": "user", "content": "hello"},
    {"role": "assistant", "content": "Hello! I am DeepSeek.", "reasoning_content": "thinking..."},
    {"role": "user", "content": "1+1=?"}
]

# messages -> string
prompt = encode_messages(messages, thinking_mode="thinking")

# string -> tokens
import transformers
tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V4-Pro")
tokens = tokenizer.encode(prompt)

How to Run Locally

Please refer to the inference folder for detailed instructions on running DeepSeek-V4 locally, including model weight conversion and interactive chat demos.

For local deployment, we recommend setting the sampling parameters to temperature = 1.0, top_p = 1.0. For the Think Max reasoning mode, we recommend setting the context window to at least 384K tokens.

License

This repository and the model weights are licensed under the MIT License.

Citation

@misc{deepseekai2026deepseekv4,
      title={DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence},
      author={DeepSeek-AI},
      year={2026},
}

Contact

If you have any questions, please raise an issue or contact us at service@deepseek.com.

30
31
32
33
34
35
36
37
38
39
40
 
 

Archive link: https://archive.ph/MjhpZ (Links omitted)

We’ve been saying this for years now, and we’re going to keep saying it until the message finally sinks in: mandatory age verification creates massive, centralized honeypots of sensitive biometric data that will inevitably be breached. Every single time. And every single time it happens, the politicians who mandated these systems and the companies that built them act shocked—shocked!—that collecting enormous databases of government IDs, facial scans, and biometric data from millions of people turns out to be a security nightmare.

Well, here we go again. A couple weeks ago, Discord announced it would launch “teen-by-default” settings for its global audience, meaning all users would be shunted into a restricted experience unless they verified their age through biometric scanning. The internet, predictably, was not thrilled. But while many users were busy venting their frustration, a group of security researchers decided to do something more useful: they took a look under the hood at Persona, one of the companies Discord was using for verification (specifically for users in the UK).

What they found, according to The Rage, was exactly what we would predict:

Together with two other researchers, they set out to look into Persona, the San Francisco-based startup that’s used by Discord for biometric identity verification – and found a Persona frontend exposed to the open internet on a US government authorized server. In 2,456 publicly accessible files, the code revealed the extensive surveillance Persona software performs on its users, bundled in an interface that pairs facial recognition with financial reporting – and a parallel implementation that appears designed to serve federal agencies.

Let me say that again: 2,456 publicly accessible files sitting on a government-authorized server, exposed to the open internet. Files that revealed a system performing not a simple age check, but a ton of potentially intrusive checks:

Once a user verifies their identity with Persona, the software performs 269 distinct verification checks and scours the internet and government sources for potential matches, such as by matching your face to politically exposed persons (PEPs), and generating risk and similarity scores for each individual. IP addresses, browser fingerprints, device fingerprints, government ID numbers, phone numbers, names, faces, and even selfie backgrounds are analyzed and retained for up to three years. The information the software evaluates on the images themselves includes “Selfie Suspicious Entity Detection,” a “Selfie Age Inconsistency Comparison,” similar background detection, which appears to be matched to other users in the database, and a “Selfie Pose Repeated Detection,” which seems to be used to determine whether you are using the same pose as in previous pictures.

This was the same company checking whether a teenager should be allowed to use voice chat on a gaming platform.

Beyond offering simple services to estimate your age, Persona’s exposed code compares your selfie to watchlist photos using facial recognition, screens you against 14 categories of adverse media from mentions of terrorism to espionage, and tags reports with codenames from active intelligence programs consisting of public-private partnerships to combat online child exploitative material, cannabis trafficking, fentanyl trafficking, romance fraud, money laundering, and illegal wildlife trade.

So you wanted to verify you’re old enough to use voice chat, and now there’s a permanent risk score somewhere documenting whether you might be involved in illegal wildlife trafficking.

What could go wrong?

As the researchers put it to The Rage:

“The internet was supposed to be the great equalizer. Information wants to be free, the network interprets censorship as damage and routes around it, all that beautiful optimism. And for a minute it was true.”

[….]

“The state wants to see everything. The corporations want to see everything. And they’ve learned to work together.”

Discord, to its credit, has now said it will not be proceeding with Persona for identity verification. And to be fair, Discord and similar internet companies are in an impossible position here—facing mounting regulatory pressure in multiple jurisdictions to verify ages while being handed a market of vendors who keep turning out to be security nightmares. But this is part of a pattern that should be deeply familiar by now.

Just last year, Discord’s previous third-party age verification partner suffered a breach that exposed 70,000 government ID photos, which were then held for ransom. Discord said it stopped using that vendor. Then it moved to Persona, which was already raising concerns due to connections to Peter Thiel. Now Persona’s frontend is found wide open on a government-authorized server, and Discord is dropping them too.

See the pattern? Discord keeps swapping vendors like someone frantically rotating buckets under a leaking roof, apparently hoping the next bucket won’t have a hole in it. But the problem was never the bucket. The problem is the hole in the roof — the never-ending stream of age-verification government mandates.

And this brings us to the bigger, more important point that almost nobody in the “protect the children” policy crowd seems willing to engage with honestly. Every single time you mandate age verification, you are mandating the creation of a centralized database of extraordinarily sensitive personal information. Government IDs. Biometric facial data. The kind of data that, once breached, cannot be “changed” like a password. You get one face. You get one government ID number. When those leak—and they will leak—the damage is permanent.

Even the IEEE Spectrum Magazine is now publishing articles that detail how age verification undermines any effort to protect children by putting their privacy at risk.

These systems fail in predictable ways.

False positives are common. Platforms identify as minors adults with youthful faces, or adults who are sharing family devices, or have otherwise unusual usage. They lock accounts, sometimes for days. False negatives also persist. Teenagers learn quickly how to evade checks by borrowing IDs, cycling accounts, or using VPNs.

The appeal process itself creates new privacy risks. Platforms must store biometric data, ID images, and verification logs long enough to defend their decisions to regulators. So if an adult who is tired of submitting selfies to verify their age finally uploads an ID, the system must now secure that stored ID. Each retained record becomes a potential breach target.

Scale that experience across millions of users, and you bake the privacy risk into how platforms work.

We have been cataloging these breaches for years. In 2024, Australia greenlit an age verification pilot, and hours later a mandated verification database for bars was breached. That same year, another ID verification service was breached, exposing private info collected on behalf of Uber, TikTok, and more. Then came the Discord vendor breach last year. And now Persona.

This keeps happening because it has to keep happening. It’s the inevitable result of a system designed to aggregate the exact kind of data that attackers most want to steal. Computer scientists and privacy experts have been sounding this alarm for years. And what makes this even more galling is that these age verification systems don’t even accomplish what they claim to accomplish. Take Australia’s infamous ban on social media for under-16s, the poster child for this approach. It’s been a complete failure on its own terms: plenty of kids have already figured out ways around the ban, while those who can’t—particularly kids with disabilities who relied on social platforms for community—are being actively harmed by their exclusion. As the security researcher who helped discover the Persona leak, Celeste, told The Rage:

“Normies won’t be able to bypass these,” while less benevolent people “will always find ways to exploit your system.”

So we’ve built a system that fails to keep out the people it’s supposedly targeting, while successfully creating permanent biometric dossiers on millions of law-abiding users. Not great!

Meanwhile, what’s happening at the legislative level is perhaps even more cynical. Governments around the world are pushing harder and harder for mandatory age verification online. And as these mandates create a captive market worth billions of dollars, a whole ecosystem of venture-backed “identity-as-a-service” startups has sprung up to serve it. Persona, valued at $2 billion and backed by Peter Thiel’s investment network, is just one of many. These companies make grand promises about privacy-preserving verification, get contracts with major platforms, and then — whoops — leave 2,456 files exposed on a government server.

And, of course, these very firms are now lobbying for stricter age verification mandates. They’ve positioned themselves as protectors of children while actively working to expand the legal requirements that guarantee their revenue stream.

Lawmakers mandate an impossible task, VC-backed startups pop up to sell a “solution,” those startups then lobby for even stricter mandates to protect their market, and the cycle repeats. “Child safety” has simply become the marketing department for a rent-seeking surveillance industry.

As long as the law demands that these biometric gates exist, the “security” of the data they collect will always be a secondary concern to “compliance” with the mandate. Companies will keep rotating through vendors, each one promising that their system is the one that won’t leak, right up until it does. And the age verification industry will keep lobbying for stricter laws, because every new mandate is another guaranteed revenue stream.

The researchers who exposed Persona’s frontend hope their findings will serve as a wake-up call. Given the track record, it probably won’t be. Discord dropping Persona changes nothing—the next vendor will collect the same data, make the same promises, and eventually suffer the same breach. Because the problem was never which company holds your biometric data. The problem is that anyone is being forced to hand it over in the first place.

41
42
43
44
45
46
47
48
49
50
view more: ‹ prev next ›