Technology

1420 readers
56 users here now

A tech news sub for communists

founded 3 years ago
MODERATORS
226
227
 
 

The deep integration of artificial intelligence (AI) with the real economy is profoundly reshaping models of manufacturing and economic structures, accelerating industrial upgrading.

On January 7, China's Ministry of Industry and Information Technology, together with seven other departments, released a work plan to deepen the integration of the manufacturing sector and artificial intelligence (AI).

The document outlines seven priority areas—innovation foundation, intelligent upgrading, product breakthrough, market player, ecosystem expansion, security assurance, and international cooperation. It also details 21 specific measures to speed up the intelligent, sustainable, and integrated development of the manufacturing sector.

According to the document, China will achieve secure and reliable supply of key AI technologies by 2027, with its industrial scale and empowerment capacity remaining among the world's leading ranks.

The document calls for promoting the in-depth application of three to five general-purpose large AI models in manufacturing, launching 1,000 high-level industrial intelligent agents, building 100 high-quality industrial data sets, and promoting 500 typical application scenarios.

It aims to foster two to three ecosystem-leading enterprises with global influence, nurture a group of specialized small and medium-sized businesses that produce novel and unique products, cultivate a number of application service providers proficient in both AI and industrial development, and establish 1,000 benchmark firms.

  • A 5G-enabled digital production line for photovoltaic panels runs at full capacity in a workshop of a new energy technology company in Suqian, east China's Jiangsu province. (Photo/Liu Ye)

A globally leading open-source and open ecosystem will be established, with comprehensive improvements in security governance, contributing Chinese solutions to global AI development.

In textile workshops, air-conditioning fans are needed to regulate temperature and humidity, purify the air, and ensure ventilation. Traditional products rely heavily on manual adjustment, resulting in low precision and difficulty in predicting equipment failures.

"By using Inspur Yunzhou Industrial Internet Platform, we installed multiple types of sensors on traditional fans to collect and train data, and developed digital-intelligent fans," said Wu Zicai, chairman of Shandong Jinxin Air Conditioning Group.

According to Wu, these digital-intelligent fans can optimize parameters such as air volume in real time based on operating conditions, precisely control workshop temperature and humidity, and reduce equipment maintenance cycles by 40 percent.

"Enterprises need to accelerate full-process transformation and upgrading by deeply embedding large-model technologies into all stages—from research and development (R&D) and pilot testing to production, marketing, services, and operations management—so as to enhance capabilities in assisted design, simulation modeling, production scheduling, and predictive maintenance," an expert said.

In the R&D and design phase, efforts should focus on promoting intelligent design assistance, software code generation, and pharmaceutical research, creating new R&D models that are more personalized, lower in cost, and higher in efficiency.

In production and manufacturing, industrial quality inspection technologies such as machine vision and unmanned intelligent inspections should be expanded. This will enhance real-time monitoring of production lines, predictive maintenance, and the precision of equipment fault identification, while enabling early warnings for potential safety risks and incidents in production operations.

In operations and management, the analytical and generative capabilities of large AI models should be used to enhance enterprises' management of strategy, human resources, finance, and risks.

To support enterprises using AI in R&D, production, operations, and value-added services, the document provides additional guidance for AI application in manufacturing.

This application guide provides detailed, step-by-step guidance on conducting intelligent assessments and planning, strengthening foundational digital capabilities, building high-quality data sets, reasonably planning computing power resources, selecting and optimizing models, deploying and integrating models, and ensuring AI application security—offering hands-on pathways and methods for intelligent transformation and upgrading.

From experience-based mining to smart exploration, and from craftsmanship-driven smelting to AI-enabled precision control, China's non-ferrous metals industry is also advancing rapidly. Recently, the industry's large model Kun'an 2.0 was released, further exploring the deep integration of AI technologies across the entire non-ferrous metals industrial chain.

"Over the past year, we have applied AI to key industrial processes and promoted the development of more than 100 application scenarios. From these, we selected and released 52 scenarios and built eight high-quality industry data sets," said Duan Xiangdong, chairman of Aluminum Corporation of China.

Ge Honglin, president of the China Nonferrous Metals Industry Association, noted that the industry features a wide range of products, complex resources, and highly intricate process flows, and faces challenges in digital and intelligent development in areas such as technology adaptation, data governance, coordination, and talent development. To address such industry-wide challenges, the document clearly calls for the development of high-level industry models and the acceleration of AI-enabled applications in key sectors.

"AI applications in manufacturing should be advanced through differentiated approaches, taking into account each industry's characteristics, technological maturity, and level of digitalization," said an official with the Department of Science and Technology of the Ministry of Industry and Information Technology.

The official added that the document also provides tailored guidance for sectors including raw materials, equipment manufacturing, consumer goods, electronic information, and software and information technology services, supporting industry-specific transformation efforts.

228
229
230
231
232
233
234
 
 

The paper argues that we have been wasting a lot of expensive GPU cycles by forcing transformers to relearn static things like names or common phrases through deep computation. Standard models do not have a way to just look something up so they end up simulating memory by passing tokens through layer after layer of feed forward networks. DeepSeek introduced a module called Engram which adds a dedicated lookup step for local N-gram patterns. It acts like a new way to scale a model that is separate from the usual compute heavy Mixture of Experts approach.

The architecture uses multi head hashing to grab static embeddings for specific token sequences which are then filtered through a context aware gate to make sure they actually fit the current situation. They found a U shaped scaling law where the best performance happens when you split your parameter budget between neural computation and this static memory. By letting the memory handle the simple local associations the model can effectively act like it is deeper because the early layers are not bogged down with basic reconstruction.

One of the best bits is how they handle hardware constraints by offloading the massive lookup tables to host RAM. Since these lookups are deterministic based on the input tokens the system can prefetch the data from the CPU memory before the GPU even needs it. This means you can scale to tens of billions of extra parameters with almost zero impact on speed since the retrieval happens while the previous layers are still calculating.

The benchmarks show that this pays off across the board especially in long context tasks where the model needs its attention focused on global details rather than local phrases. It turns out that even in math and coding the model gets a boost because it is no longer wasting its internal reasoning depth on things that should just be in a lookup table. Moving forward this kind of conditional memory could be a standard part of sparse models because it bypasses the physical memory limits of current hardware.

235
236
237
238
 
 

When you employ AI agents, there’s a significant volume problem for document study. Reading one file of 1000 lines consumes about 10,000 tokens. Token consumption incurs costs and time penalties. Codebases with dozens or hundreds of files, a common case for real world projects, can easily exceed 100,000 tokens in size when the whole thing must be considered. The agent must read and comprehend, and be able to determine the interrelationships among these files. And, particularly, when the task requires multiple passes over the same documents, perhaps one pass to divine the structure and one to mine the details, costs multiply rapidly.

Matryoshka is a tool for document analysis that achieves over 80% token savings while enabling interactive and exploratory analysis. The key insight of the tool is to save tokens by caching past analysis results, and reusing them, so you do not have to process the same document lines again. These ideas come from recent research, and retrieval-augmented generation, with a focus on efficiency. We'll see how Matryoshka unifies these ideas into one system that maintains a persistent analytical state. Finally, we'll take a look at some real-world results analyzing the anki-connect codebase.


The Problem: Context Rot and Token Costs

A common task is to analyze a codebase to answers a question such as “What is the API surface of this project?” Such work includes identifying and cataloguing all the entry points exposed by the codebase.

Traditional approach:

  1. Read all source files into context (~95,000 tokens for a medium project)
  2. The LLM analyzes the entire codebase’s structure and component relationships
  3. For follow-up questions, the full context is round-tripped every turn

This creates two problems:

Token Costs Compound

Every time, the entire context has to go to the API. In a 10-turn conversation about a codebase of 7,000 lines, almost a million tokens might be processed by the system. Most of those tokens are the same document contents being dutifully resent, over and over. The same core code is sent with every new question. This redundant transaction is a massive waste. It forces the model to process the same blocks of text repeatedly, rather than concentrating its capabilities on what’s actually novel.

Context Rot Degrades Quality

As described in the Recursive Language Models paper, even the most capable models exhibit a phenomenon called context degradation, in which their performance declines with increasing input length. This deterioration is task-dependent. It’s connected to task complexity. In information-dense contexts, where the correct output requires the synthesis of facts presented in widely dispersed locations in the prompt, this degradation may take an especially precipitous form. Such a steep decline can occur even for relatively modest context lengths, and is understood to reflect a failure of the model to maintain the threads of connection between large numbers of informational fragments long before it reaches its maximum token capacity.

The authors argue that we should not be inserting prompts into the models, since this clutters their memory and compromises their performance. Instead, documents should be considered as external environments with which the LLM can interact by querying, navigating through structured sections, and retrieving specific information on an as-needed basis. This approach treats the document as a separate knowledge base, an arrangement that frees up the model from having to know everything.


Prior Work: Two Key Insights

Matryoshka builds on two research directions:

Recursive Language Models (RLM)

The RLM paper introduces a new methodology that treats documents as external state to which step-by-step queries can be issued, without the necessity of loading them entirely. Symbolic operations, search, filter, aggregate, are actively issued against this state, and only the specific, relevant results are returned, maintaining a small context window while permitting analysis of arbitrarily large documents.

Key point is that the documents stay outside the model, and only the search results enter the context. This separation of concerns ensures that the model never sees complete files, instead, a search is initiated to retrieve the information.

Barliman: Synthesis from Examples

Barliman, a tool developed by William Byrd and Greg Rosenblatt, shows that it is possible to use program synthesis without asking for precise code specifications. Instead, input/output examples are used, and a solver engine is used as a relational programming system in the spirit of miniKanren. Barliman uses such a system to synthesize functions that satisfy the constraints specified. The system interprets the examples as if they were relational rules, and the synthesis engine tries to satisfy them. This approach makes it possible to describe what is desired for concrete test cases.

The approach is to simply show examples of the kind of behavior one wishes the system to exhibit, letting it derive the implmentation on its own. Thus, the emphasis shifts from writing long and detailed step-by-step recipes for behavior to simply portraying, in a declarative fashion, what the desired goal is.


Matryoshka: Combining the Insights

Matryoshka incorporates these insights into a functioning system for LLM agents. A practical tool is provided that enables agents to decompose challenging tasks into a sequence of smaller and more manageable objectives.

1. Nucleus: A Declarative Query Language

Instead of issuing commands, the LLM describes what it wants, using Nucleus, a simple S-expression query language. This changes the focus from describing each step to specifying the desired outcome.

(grep "class ")           ; Find all class definitions
(count RESULTS)           ; Count them
(map RESULTS (lambda x    ; Extract class names
  (match x "class (\\w+)" 1)))

We observe that the declarative interface retains its robustness even when the LLM employs different vocabulary or sentence structures. This robustness originates from the system’s commitment to elucidating the underlying intent of a request, independent of superficial linguistic variations.

2. Pointer-Based State

The key new insight is that we can separate the results from the context. Results are now stored in the REPL state, rather than in the context.

When the agent runs (grep "def ") and gets 150 matches:

  • Traditional tools: All 150 lines are fed into context, and round-tripped every turn
  • Matryoshka: Binds matches to RESULTS in the REPL, returning only "Found 150 results"

The variable RESULTS is bound to the actual value in the REPL. This binding acts as a pointer, revealing the location of the data within the server's memory. Subsequent operations, queries, for example, or updates, use this reference to access the data. But the data itself never actually enters the conversation:

Turn 1: (grep "def ")         → Server stores 150 matches as RESULTS
                              → Context gets: "Found 150 results"

Turn 2: (count RESULTS)       → Server counts its local RESULTS
                              → Context gets: "150"

Turn 3: (filter RESULTS ...)  → Server filters locally
                              → Context gets: "Filtered to 42 results"

The LLM never sees the 150 function definitions, just the aggregated answers from these functions.

3. Synthesis from Examples

When queries need custom parsing, Matryoshka synthesizes functions from examples:

(synthesize_extractor
  "$1,250.00" 1250.00
  "€500" 500
  "$89.99" 89.99)

The synthesizer learns the pattern directly from examples, obtaining numerical values straight from the currency strings and entirely circumventing the need to construct manual regex.


The Lifecycle

A typical Matryoshka session:

1. Load Document

(load "./plugin/__init__.py")
→ "Loaded: 2,244 lines, 71.5 KB"

The document is parsed and stored server-side. Only metadata enters the context.

2. Query Incrementally

(grep "@util.api")
→ "Found 122 results, bound to RESULTS"
   [402] @util.api()
   [407] @util.api()
   ... (showing first 20)

Each query returns a preview plus the count. Full data stays on server.

3. Chain Operations

(count RESULTS)           → 122
(filter RESULTS ...)      → "Filtered to 45 results"
(map RESULTS ...)         → Transforms bound to RESULTS

Operations chain through the RESULTS binding. Each step refines without re-querying.

4. Close Session

(close)
→ "Session closed, memory freed"

Sessions auto-expire after 10 minutes of inactivity.


How Agents Discover and Use Matryoshka

Matryoshka integrates with LLM agents via the Model Context Protocol (MCP).

Tool Discovery

When the agent starts, it launches Matryoshka as an MCP server and receives a tool manifest:

{
  "tools": [
    {
      "name": "lattice_load",
      "description": "Load a document for analysis..."
    },
    {
      "name": "lattice_query",
      "description": "Execute a Nucleus query..."
    },
    {
      "name": "lattice_help",
      "description": "Get Nucleus command reference..."
    }
  ]
}

The agent sees the available tools and their descriptions. When a user asks to analyze a file, it decides which tools to use based on the task.

Guided Discovery

The lattice_help tool returns a command reference, teaching the LLM the query language on-demand:

; Search commands
(grep "pattern")              ; Regex search
(fuzzy_search "query" 10)     ; Fuzzy match, top N
(lines 10 20)                 ; Get line range

; Aggregation
(count RESULTS)               ; Count items
(sum RESULTS)                 ; Sum numeric values

; Transformation
(map RESULTS fn)              ; Transform each item
(filter RESULTS pred)         ; Keep matching items

The agent learns capabilities incrementally rather than needing upfront training.

Session Flow

User: "How many API endpoints does anki-connect have?"

Agent: [Calls lattice_load("plugin/__init__.py")]
        → "Loaded: 2,244 lines"

Agent: [Calls lattice_query('(grep "@util.api")')]
        → "Found 122 results"

Agent: [Calls lattice_query('(count RESULTS)')]
        → "122"

Agent: "The anki-connect plugin exposes 122 API endpoints,
         decorated with @util.api()."

Each tool invocation maintains its own state within the conversation. So, for example, when a document is loaded, that content is retained in memory. Similarly, the results of any query that is executed are saved and available for later use.


Real-World Example: Analyzing anki-connect

Let's walk through a complete analysis of the anki-connect Anki plugin. Here we have a real-world codebase with 7,770 lines across 17 files.

The Task

"Analyze the anki-connect codebase: find all classes, count API endpoints, extract configuration defaults, and document the architecture."

The Workflow

The agent uses Matryoshka's prompt hints to accomplish the following workflow:

  1. Discover files with Glob
  2. Read small files directly (<300 lines)
  3. Use Matryoshka for large files (>500 lines)
  4. Aggregate across all files

Step 1: File Discovery

Glob **/*.py → 15 Python files
Glob **/*.md → 2 markdown files

File sizes:
  plugin/__init__.py    2,244 lines  → Matryoshka
  plugin/edit.py          458 lines  → Read directly
  plugin/web.py           301 lines  → Read directly
  plugin/util.py          107 lines  → Read directly
  README.md             4,660 lines  → Matryoshka
  tests/*.py           11 files      → Skip (tests)

Step 2: Read Small Files

Reading util.py (107 lines) reveals configuration defaults:

DEFAULT_CONFIG = {
    'apiKey': None,
    'apiLogPath': None,
    'apiPollInterval': 25,
    'apiVersion': 6,
    'webBacklog': 5,
    'webBindAddress': '127.0.0.1',
    'webBindPort': 8765,
    'webCorsOrigin': None,
    'webCorsOriginList': ['http://localhost/'],
    'ignoreOriginList': [],
    'webTimeout': 10000,
}

Reading web.py (301 lines) reveals the server architecture:

  • Classes: WebRequest, WebClient, WebServer
  • JSON-RPC style API with jsonschema validation
  • CORS support with configurable origins

Step 3: Query Large Files with Matryoshka

; Load the main plugin file
(load "plugin/__init__.py")
→ "Loaded: 2,244 lines, 71.5 KB"

; Find all classes
(grep "^class ")
→ "Found 1 result: [65] class AnkiConnect:"

; Count methods
(grep "def \\w+\\(self")
→ "Found 148 results"

; Count API endpoints
(grep "@util.api")
→ "Found 122 results"

; Load README for documentation
(load "README.md")
→ "Loaded: 4,660 lines, 107.2 KB"

; Find documented action categories
(grep "^### ")
→ "Found 13 sections"
   [176] ### Card Actions
   [784] ### Deck Actions
   [1231] ### Graphical Actions
   ...

Complete Findings

Metric Value
Total files 17 (15 .py + 2 .md)
Total lines 7,770
Classes 8 (1 main + 3 web + 4 edit)
Instance methods 148
API endpoints 122
Config settings 11
Imports 48
Documentation sections 8 categories, 120 endpoints

Token Usage Comparison

Approach Lines Processed Tokens Used Coverage
Read everything 7,770 ~95,000 100%
Matryoshka only 6,904 ~6,500 65%
Hybrid 7,770 ~17,000 100%

The hybrid method achieves a 82% savings in tokens while retaining 100% of the original coverage. This approach combines two different strategies, one for compressing redundant information and one for preserving unique insights.

The pure Matryoshka approach ends up missing details from small files (configuration defaults, web server classes), because the agent only uses the tool to query large ones. The hybrid workflow does direct, full-content reads on small files, while leveraging Matryoshka to analyze bigger files, in a kind of divide-and-conquer strategy. All that's needed is to provide the agent an explicit hint on the strategy to use.

Why Hybrid Works

Small files (<300 lines) contain critical details:

  • util.py: All configuration defaults, the API decorator implementation
  • web.py: Server architecture, CORS handling, request schema

These fit comfortably in context, and there's no need to do anything different. Matryoshka adds value for:

  • __init__.py (2,244 lines): Query specific patterns without loading everything
  • README.md (4,660 lines): Search documentation sections on demand

Architecture

┌─────────────────────────────────────────────────────────┐
│                     Adapters                             │
│  ┌──────────┐  ┌──────────┐  ┌───────────────────────┐ │
│  │   Pipe   │  │   HTTP   │  │   MCP Server          │ │
│  └────┬─────┘  └────┬─────┘  └───────────┬───────────┘ │
│       │             │                     │             │
│       └─────────────┴─────────────────────┘             │
│                          │                               │
│                ┌─────────┴─────────┐                    │
│                │   LatticeTool     │                    │
│                │   (Stateful)      │                    │
│                │   • Document      │                    │
│                │   • Bindings      │                    │
│                │   • Session       │                    │
│                └─────────┬─────────┘                    │
│                          │                               │
│                ┌─────────┴─────────┐                    │
│                │  NucleusEngine    │                    │
│                │  • Parser         │                    │
│                │  • Type Checker   │                    │
│                │  • Evaluator      │                    │
│                └─────────┬─────────┘                    │
│                          │                               │
│                ┌─────────┴─────────┐                    │
│                │    Synthesis      │                    │
│                │  • Regex          │                    │
│                │  • Extractors     │                    │
│                │  • miniKanren     │                    │
│                └───────────────────┘                    │
└─────────────────────────────────────────────────────────┘

Getting Started

Install from npm:

npm install matryoshka-rlm

As MCP Server

Add to your MCP configuration:

{
  "mcpServers": {
    "lattice": {
      "command": "npx",
      "args": ["lattice-mcp"]
    }
  }
}

Programmatic Use

import { NucleusEngine } from "matryoshka-rlm";

const engine = new NucleusEngine();
await engine.loadFile("./document.txt");

const result = engine.execute('(grep "pattern")');
console.log(result.value); // Array of matches

Interactive REPL

npx lattice-repl
lattice> :load ./data.txt
lattice> (grep "ERROR")
lattice> (count RESULTS)

Conclusion

Matryoshka embodies the principle, emerging from RLM research, that documents are to be treated as external environments rather than as contexts to be parsed. This principle alters the fundamental character of the model’s engagement, no longer a passive reader but an active agent, navigating through and interrogating a document to extract specific information, somewhat as a programmer would browse through code. Combined with Barliman-style synthesis, in which a solution is built up in a series of small, well-defined steps, and pointer-based state management, it achieves:

  • 82% token savings on real-world codebase analysis
  • 100% coverage when combined with direct reads for small files
  • Incremental exploration where each query builds on previous results
  • No context rot because documents stay outside the model

We observe that variable bindings such as RESULTS refer to REPL state rather than holding data directly in model context. As we formulate and submit queries, what is sent to the server are mere pointers, placeholders indicating where the actual computation should occur. It is the server that executes the substantive computational tasks, returning only the distilled results.

source here: https://git.sr.ht/~yogthos/matryoshka

239
240
241
242
243
244
 
 

Most people in the field know that models usually fall apart after a few hundred steps because small errors just keep adding up until the whole process is ruined. The paper proposes a system called MAKER which uses a strategy they call massively decomposed agentic processes. Instead of asking one big model to do everything they break the entire task down into the smallest possible tiny pieces so each microagent only has to worry about one single move.

For their main test they used a twenty disk version of the Towers of Hanoi puzzle which actually requires over a million individual moves to finish. They found that even small models can be super reliable if you set them up correctly. One of the main tricks they used is a voting system where multiple agents solve the same tiny subtask and the system only moves forward once one answer gets a specific number of votes more than the others. This acts like a safety net that catches random mistakes before they can mess up the rest of the chain.

Another interesting part of their approach is red flagging which is basically just throwing away any response that looks suspicious or weird. If a model starts rambling for too long or messes up the formatting they just discard that attempt and try again because those kinds of behaviors usually mean the model is confused and likely to make a logic error. By combining this extreme level of task breakdown with constant voting and quick discarding of bad samples they managed to complete the entire million step process with zero errors.

And it turns out that you do not even need the most expensive or smartest models to do this since relatively small ones performed just as well for these tiny steps. Scaling up AI reliability might be more about how we organize the work rather than just making the models bigger and bigger. They even did some extra tests with difficult math problems like large digit multiplication and found that the same recursive decomposition and voting logic worked there as well.

245
246
247
248
249
 
 

In my view, this is the exact right approach. LLMs aren't going anywhere, these tools are here to stay. The only question is how they will be developed going forward, and who controls them. Boycotting AI is a really naive idea that's just a way for people to signal group membership.

Saying I hate AI and I'm not going to use it is really trending and makes people feel like they're doing something meaningful, but it's just another version of trying to vote the problem away. It doesn't work. The real solution is to roll up the sleeves and built an a version of this technology that's open, transparent, and community driven.

250
view more: ‹ prev next ›