this post was submitted on 27 Feb 2026

82 points (100.0% liked)

Technology

42379 readers

108 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 4 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

coldredlight@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org

A 25-Person Startup Built a Chip That Only Runs One AI Model. It's 73 Times Faster Than Nvidia. (dev.to)

submitted 3 days ago by mothasa@x69.org to c/technology@beehaw.org

42 comments fedilink hide all child comments

Taalas HC1: 17,000 tokens/sec on Llama 3.1 8B vs Nvidia H200's 233 tokens/sec. 73x faster at one-tenth the power. Each chip runs ONE model, hardwired into the transistors.

top 42 comments

sorted by: hot top controversial new old

[–] ImperialStout@beehaw.org 6 points 2 days ago (2 children)

This sounds great to me. Anything that would increase supply of AI processing could lower demand on the GPU supply. I want to be able to upgrade my gaming computer again someday!

[–] danhab99@programming.dev 1 points 1 day ago (1 children)

AI really needs dedicated hardware, I feel like if there was more chip manufacturing in the west we might have more diverse chips.

Frankly I'm really confused as to why this llm demand on ram isn't encouraging new companies to manufacture ram. If this is a bubble then we all just wait it out, if it's not a bubble then someone else would swoop in to take up the market.

[–] ByteSorcerer@beehaw.org 2 points 1 day ago

It's not easy to scale up chip production, because it relies on extremely precise machines, which take a long time to build, and many different steps on the way from raw materials to a finished chip.

If you'd want to set up a new RAM chip factory with competitive performance, it'd be an investment of over a billion USD at the bare minimum, and it'd take a few years to set up all the processes because the first chips can roll off the assembly line.

If the bubble popped by then, then your new factory would probably run at a loss because it's nearly impossible to complete with companies who have had decades to optimise their production processes.

Even if the bubble didn't pop by then, then the next problem will likely be the wafer supply. Because just like how there are only a few companies with the infrastructure to build modern, high-performance computer chips, there are only a few companies with the infrastructure to build silicon wafers of a high enough quality to build those chips with. And they have only just enough capacity to supply their current customers.

So to then solve the wafer problem, someone needs to be willing to invest at least a few hundreds of millions of USD to build a new factory for those, which again would struggle to complete in a post-scarcity market. And wafers are far from being the only resource with that issue.

TL;DR: It's be a huge investment and a huge gamble, and would likely end up just moving the problem anyway.

[–] Appoxo@lemmy.dbzer0.com 2 points 1 day ago

Every chip that is produced, takes away capacity that could have been used for consumer products.

So yeah...not great.

[–] notabot@piefed.social 61 points 3 days ago (3 children)

Dedicated, single purpose, chip designs are always going to be faster and more efficient to run than general purpose ones. The question will be what the environmental, and financial costs will be of updating to a new model. With a general purpose design it's just a case of liading sone new code. With a model that's baked into the silicon you have to design and manufacture new chips, then install them.

I can see this being useful in certain niche usecases where requirements are not going to change, but it sounds rather limiting in the general case.

[–] keepthepace@tarte.nuage-libre.fr 1 points 1 day ago

You know what I want? A Whisper chip. Or whatever language is better now, but it is good enough for so many application. Give the sense of hearing to appliances. I guarantee you that in 20 years it will still be used.

[–] MagicShel@lemmy.zip 17 points 3 days ago (1 children)

A lot of the models we have are about as good as they are going to get. I mean, ChatGPT 5 isn't appreciably better than ChatGPT 4. Hook one of those models or even one not as strong to a purpose-built RAG pipeline and a controller to run as mesh of interconnected prompts and agents, and you'll blow away general purpose chatbots in niche areas in terms of cost, efficiency, and performance.

The question then becomes, to what purpose can you put this super fast, dedicated machine that performs certain small-scopes, simple tasks really well, but also fucks up often enough that you can't depend on it. To what tasks could you set a bot that does stuff with minimal competence let's say 90% of the time, and the other 10%, doesn't create even bigger problems?

That domain exists, but it's thin and narrow.

[–] FaceDeer@fedia.io 7 points 3 days ago (1 children)

To what tasks could you set a bot that does stuff with minimal competence let's say 90% of the time, and the other 10%, doesn't create even bigger problems?

Sounds like a typical human to me.

A chip like this would be perfect for an autonomous robot. Drone, humanoid, whatever - something that still needs to be able to handle itself when it's cut off from outside control. Always nice to have an internet connection to draw on a bigger, more capable "brain" somewhere else, but if that connection is lost you want it to be able to carry on with whatever it's doing and not just flop over limply.

[–] MagicShel@lemmy.zip 6 points 3 days ago* (last edited 3 days ago) (2 children)

Sure. It excels in cases where 60-90% success rate is better than nothing. If you have a smart mine that doesn't detonate on civilians, 50% success is better than 0. It reduces civilian casualties by 50%, which is still awful, but if you're going to plant mines it's better than entirely indiscriminate. Use cases definitely exist. A false positive means it doesn't detonate on one soldier but might on the next — still an effective deterrent. A false negative means it blows up a kid, which a dumb mine would also do anyway.

It's just generally not in situations most people are generally thinking about. You have to imagine cases where there is some upside and no downside. It doesn't work in a context of say, auto-breaking a car if a child is detected because a false positive is going to cause accidents and probably kill people even if in other circumstances it does save lives.

[–] BlameThePeacock@lemmy.ca 2 points 3 days ago

A lot of ai hallucinations can be resolved by simply running the results through additional prompts automatically, then checking the various results against each other or against reference material.

Many agentic systems already do that with a limited number of follow up/check steps, but they're often restricted by acceptable response times or just sheer costs.

I managed to get copilot in excel to run a 43 prompt chain in just a little under 10 minutes the other day. The result was exactly what I needed.

If you have 73 times the output, you can potentially afford to do that kind of processing in an acceptable time frame and cost level.

[–] FaceDeer@fedia.io 1 points 3 days ago (1 children)

Why doesn't it work in those contexts? It's better than nothing in those contexts too. I'd rather have a car with onboard intelligence to take over than an uncontrolled one.

I think you're letting the perfect be the enemy of the good, here. There are plenty of situations where you don't need a robot to behave perfectly. People don't behave perfectly.

[–] MagicShel@lemmy.zip 4 points 3 days ago* (last edited 3 days ago) (1 children)

No, it doesn't work in this context because false positive is worse than nothing. False negative is better than nothing. Zero sum. Obviously it depends where you set the threshold of false positive and false negative. I imagined a very simple scenario the first time.

If even only .001% of the time, you're going to cause a shit load accidents. You're going to average a car slamming on the breaks for no reason like every.... 2 minutes would be .12, 20 would be 1.2, 200 would be 12% 800 would be 48%, so you're going to have every car slam on their breaks every 12-15 hours of drive time. That would be an absolute mess.

[–] FaceDeer@fedia.io 1 points 3 days ago (1 children)

I have no idea what you're thinking the scenario is here. The alternative is an uncontrolled car, I think I'd rather it had at least some brains behind the decisions it's making.

[–] MagicShel@lemmy.zip 3 points 3 days ago (2 children)

How does it decide the car is uncontrolled? That's a failure scenario, too.

I'm not even sure what you're arguing. I said from the get go that there are niche cases where AI is nothing but positive. You seem to be arguing that there are a bunch more cases. Fine. Maybe the niche is slightly less thin and narrow than I think. Cool.

[–] XLE@piefed.social 2 points 3 days ago (1 children)

Facedeer is just a pro-AI concern troll from Reddit.

He kicked off his part of the thread by complaining about people, and then speculating that maybe this chip could do a thing without any evidence.

[–] MagicShel@lemmy.zip 2 points 2 days ago (1 children)

I'm middle of the road on AI. I think it has uses. I also think this technology is a dead end (i.e. this is not going to lead to AGI) and had people understood from the start the limitations of it, investment would've been more modest and cautious. Is a great technology. You can do cool things with it. But it will never be able to significantly replace humans. However it may be really painful watching the investor class wrestle with that reality.

I think the chip does have uses and I think building it even with today's models would last a long time. But the number of scenarios where it is unequivocally better than nothing is smaller than AI bros (I draw a line between an enthusiast like myself and a bro who is all in and won't hear reason) want to think.

Last point. In theory this chip is great. Based on my reading this is a substitute for an H100 — a data center GPU (APU?). This isn't going into smart mines or drones and probably not cars. Not without more development. So while there is potential here, none of these use cases are practical. This is a way for OAI or whomever to run their current models just the way they are for cheaper but with a hardware cost to upgrade. This isn't going to matter for the rest of us for a while.

[–] TehPers@beehaw.org 2 points 1 day ago

had people understood from the start the limitations of it, investment would've been more modest and cautious

People did understand from the start. Those who do the investing just didn't listen, or they had a different motive. These days it's impossible to tell which.

And by "people" I'm not referring to random people, but those who have been closer than most to the development of these models. There has been an unbelievable amount of research done on everything from the effectiveness of specific models in niche fields to the ability to use an LLM as the backend for a production service. Again, no amount of negative feedback going up the chain has made a difference in the direction, so that only leaves a few explanations on why the investment continues to be so high.

[–] FaceDeer@fedia.io 0 points 3 days ago (1 children)

When the regular controller of the car - be it human, another AI, whatever - isn't sending control signals, then the onboard controller knows that the car is uncontrolled. Of course it's a "failure scenario", I'm suggesting that this chip would be ideal for picking up when that sort of thing happens. The alternative is to just fall over.

I, too, am not sure what you're arguing. I suggested that a low-power high-speed AI chip like this would be ideal for putting in robots, which have power constraints and aren't always in reliable contact with outside controllers. That's a very broad "niche" indeed. I don't know what all this landmine stuff or probabilities of brake-slamming is all about or how it relates to what I suggested.

[–] MagicShel@lemmy.zip 1 points 3 days ago* (last edited 3 days ago)

My scenario was a safety device that prevented cars from hitting pedestrians. You're stuck on this autonomous self control in the event of loss of human control and it seems like you're interpreting what I'm saying in that context, which I wasn't. I presented a scenario when it's a good idea and one when it isn't. Nothing to do with your autonomous control scenario.

But let's see. If you've got a done that can fly itself for a few seconds or minutes if it loses signal, simply loitering waiting for control to continue, or maybe continuing on a flight path until it is out of jamming range. Alternative is uncontrolled crash, possibility of avoiding that is nothing but upside, whether it's 10% or 90% success. It's a good example of the type of scenario I was describing with the smart mine.

I wasn't trying to address your scenario because it already falls into the niche I was describing. I was trying to demonstrate how to consider scenarios where AI is good vs ones where it has an unacceptable tradeoff. Where the consequences of failure don't outweigh the benefits when it gets it right.

So I think we were talking past each other, and if my communication was unclear then I apologize. In my defense, it's 2AM here.

[–] morto@piefed.social 6 points 3 days ago (1 children)

fpgas can sort of be a middle ground, but i don't know if they're capable of running llms

[–] bryndos@fedia.io 2 points 3 days ago (2 children)

Is there such a thing as modular fpga so that you could "plug in" another one and add more gates, sort of daisy chain them? I don't know if such interfaces exist , sounds like it might need lots of bandwidth.

[–] iceberg314@midwest.social 1 points 2 days ago

I bet you could! The interface and literally be what ever you want with FPGAs. You'd just have to keep things organized and program them one at a time I think

[–] morto@piefed.social 1 points 2 days ago

I know very little about fpgas, so I can't answer your question, but let's hope someone else can

[–] hazelnoot@beehaw.org 22 points 3 days ago (2 children)

Each chip runs ONE model, hardwired into the transistors.

That's... that's an ASIC. That's literally just an ASIC... with all the tradeoffs and compromises that come with it.

[–] keepthepace@tarte.nuage-libre.fr 2 points 1 day ago

ASIC just means "specialized chip". They don't claim anything else.

[–] TehPers@beehaw.org 12 points 3 days ago

Shh you'll pop the bubble if you start talking sensibly. It's not an ASIC—it's a specialized piece of hardware optimized to execute a model with unparalleled performance. Now buy my entire stock of them and all the supply for the next two years please.

(Figuring out the compose combination for an emdash took longer than I'd like to admit lol)

[–] tal@lemmy.today 21 points 3 days ago

The HC1 chip doesn't load model weights from memory. It etches them directly into the transistors. Every weight becomes a physical circuit.

That's one way to avoid memory bandwidth constraints!

[–] altphoto@lemmy.today 9 points 3 days ago

Hopefully the low cost per kill drones get more affordable. Maybe load up Linux into one of those things and just break off the murderous knives.

[–] dieICEdie@lemmy.org 6 points 3 days ago (3 children)

This would be great if you could have a machine that would allow you to swap chips… and then they only charge < 50 USD for each chip.

[–] boonhet@sopuli.xyz 2 points 3 days ago

Can't be that cheap unfortunately if they maxed out the die area. Though it is an older node so maybe not as expensive as flagship GPU chips and shit

[–] BarbecueCowboy@lemmy.dbzer0.com 3 points 3 days ago

Would be great, but feels unlikely, most of the gains they're making rely on the lack of versatility.

[–] tetrislife@leminal.space 1 points 3 days ago (1 children)

Go landfills!

[–] dieICEdie@lemmy.org 1 points 3 days ago (1 children)

That’s all technology though, sadly.

[–] FurryMemesAccount@lemmy.blahaj.zone 2 points 3 days ago (2 children)

This one feels shorter-lived than the average chip, tho.

With the hardwiring and all.

[–] MagicShel@lemmy.zip 2 points 3 days ago

The thing that differentiates ChatGPT and Claude is likely more the RAG pipeline that backs them and feeds them context. The models really aren't getting better, we're just getting better at using them to break tasks down into units so small AI can figure it out. I'd bet a GPT 5 model or a Claude Opus 4.6 model would last 5, maybe 10 years before you really start to notice its capabilities are falling behind. I'll bet you could use GPT 4o for 5-10 years and it would be fine.

[–] dieICEdie@lemmy.org 0 points 3 days ago (1 children)

But if they could make it so the chip is the only thing that is obsolete, That could be recycled pretty easily, or resold.

[–] FurryMemesAccount@lemmy.blahaj.zone 1 points 3 days ago (1 children)

Then it would stop being 73 times faster than NVIDIA.

[–] dieICEdie@lemmy.org 2 points 3 days ago (1 children)

That doesn’t make sense.

[–] FurryMemesAccount@lemmy.blahaj.zone 1 points 3 days ago (1 children)

If you add levels of indirection, extra transistors and such, it would be surprising to manage to maintain the same level of performance, especially since this design seems to rely on hardwiring to achieve its speed...

[–] dieICEdie@lemmy.org 1 points 3 days ago (1 children)

Pretty sure the advantage is the AI directly on the chip.

[–] FurryMemesAccount@lemmy.blahaj.zone 1 points 3 days ago* (last edited 3 days ago) (1 children)

Now it's your proposal's turn not to make any sense. This is an article about a chip with a hardwired model being super fast.

Of course the hardwiring is inflexible, and much, much faster.

[–] dieICEdie@lemmy.org 1 points 3 days ago

I just think you want to argue