technology

23472 readers

211 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

1. Obviously abide by the sitewide code of conduct. Bigotry will be met with an immediate ban
2. This community is about technology. Offtopic is permitted as long as it is kept in the comment sections
3. Although this is not /c/libre, FOSS related posting is tolerated, and even welcome in the case of effort posts
4. We believe technology should be liberating. As such, avoid promoting proprietary and/or bourgeois technology
5. Explanatory posts to correct the potential mistakes a comrade made in a post of their own are allowed, as long as they remain respectful
6. No crypto (Bitcoin, NFT, etc.) speculation, unless it is purely informative and not too cringe
7. Absolutely no tech bro shit. If you have a good opinion of Silicon Valley billionaires please manifest yourself so we can ban you.

founded 4 years ago

MODERATORS

context@hexbear.net

EmmaGoldman@hexbear.net

SexUnderSocialism@hexbear.net

gaycomputeruser@hexbear.net

ZoomeristLeninist@hexbear.net

ByteDance just dopped Doubao-1.5-pro tht uses sparse MoE architecture, it matches GPT 4o benchmarks while being 50x cheaper to run, and it's 5x cheaper than DeepSeek (www.aibase.com)

submitted 3 days ago by yogthos@lemmygrad.ml to c/technology@hexbear.net

90 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] tellmeaboutit@lemmygrad.ml 1 points 1 day ago (1 children)

The impression I get with DeepSeek is that their goal is to largely do research for the sake of research.

I think it’s not fair to call DeepSeek open source. They’ve released the weights of their model but that’s all. The code they used to train it and the training data itself is decidedly not open source. They aren’t the only company to release their weights either. Meta’s LlaMa was probably the best open weight model you could use prior to DS v3. As I see it, this is just a consequence of competition in a market where capital has nowhere else to go. Meta and DeepSeek likely want to prevent OpenAI from becoming profitable.

As an aside, although I personally believe in some aspects of China’s reform and opening up it’s not without its faults. Tech companies in China often make the same absurd claims and engage in behavior that’s as deluded as companies in Silicon Valley.

My main point is that the limitations of the approach that people keep fixating on don’t appear to be inherent in the way the algorithm works, they’re just an artifact of people still figuring out how to apply this algorithm in an efficient way. The fact that massive improvements have already been found suggests that there’s probably a while yet before we run out of ideas

I think this is our core disagreement. I agree, we have not pushed LLMs to their absolute limit. Mixture of Experts models, optimized training, and “reasoning models” are all incremental improvements over the previous generation of LLMs. That said, I strongly believe that the architecture of LLMs are fundamentally incapable of intelligent behavior. They’re more like a photograph of intelligence than the real thing.

I think that exploration for the sake of exploration is the correct view to have here.

I agree wholeheartedly. However, you don’t need to dump an absurd amount of resources into training an llm to test the viability of any of the incremental improvements that DeepSeek has made. You only do that if your goal is to compete with OpenAI and others for access to capital.

However, some people do make a genuine effort to understand how human cognition works.

Yes, but that work largely goes unnoticed because it’s not at all close to providing us with a way to build intelligent machines. It’s work that can only really happen at academic or public research institutions because it’s not profitable at this stage. I would be much happier if the capital currently directed towards LLMs was redirected towards this type of work. Unfortunately, we’re forced to abide by the dictates of capitalism and so that won’t happen anytime soon.

[–] yogthos@lemmygrad.ml 1 points 1 day ago

I think it’s not fair to call DeepSeek open source. They’ve released the weights of their model but that’s all. The code they used to train it and the training data itself is decidedly not open source.

Sure, but that's now become the accepted definition for open sourcing AI models. I personally find that's sufficient especially given that they published the research associated with it, which is ultimately what matters the most.

That said, I strongly believe that the architecture of LLMs are fundamentally incapable of intelligent behavior. They’re more like a photograph of intelligence than the real thing.

I think you'd have to provide the definition of intelligence you're using here. I'll provide mine here. I would define it as the capacity to construct and refine mental models of specific domains in order to make predictions about future states or outcomes within those contexts. It stems from identifying rules, patterns, and relationships that govern a particular system or environment. It's a combination of knowledge and pattern recognition that can be measured by predictive accuracy within a specific context.

Given that definition, I do not see why LLMs are fundamentally incapable of intelligent behavior. If a model is able to encode the rules of a particular domain then it is able to create an internal simulation of the system to make predictions about future states. And I think that's precisely what deep neural networks do, ad how our own brains operate. To be clear, I'm not suggesting that GPT is directly analogous to the way the brain encodes information, rather that they operate in the same fundamental fashion.

However, you don’t need to dump an absurd amount of resources into training an llm to test the viability of any of the incremental improvements that DeepSeek has made. You only do that if your goal is to compete with OpenAI and others for access to capital.

How do you define what's an absurd amount of resources, that seems kind of arbitrary to me. Furthermore, we also see that there are emergent phenomena that appear at certain scales. So, the exercise of building large models is useful to see what happens at those scales.

I would be much happier if the capital currently directed towards LLMs was redirected towards this type of work. Unfortunately, we’re forced to abide by the dictates of capitalism and so that won’t happen anytime soon.

I do think LLMs get disproportionate amount of attention, but eventually the hype will die down and people will start looking at other methods again. In fact, that's exactly what's already happening with stuff like neurosymbolic systems where deep neural networks are combined with symbolic logic. The GPT algorithm proved to be flexible and useful in many different contexts, so I don't have a problem with people spending the time to find what its limits are.