this post was submitted on 08 Feb 2025
        
      
      93 points (100.0% liked)
      TechTakes
    2264 readers
  
      
      56 users here now
      Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.
This is not debate club. Unless it’s amusing debate.
For actually-good tech, you want our NotAwfulTech community
        founded 2 years ago
      
      MODERATORS
      
    you are viewing a single comment's thread
view the rest of the comments
    view the rest of the comments
Look, I get your perspective, but zooming out there is a context that nobody's mentioning, and the thread deteriorated into name-calling instead of looking for insight.
In theory, a training pass needs one readthrough of the input data, and we know of existing systems that achieve that, from well-trodden n-gram models to the wholly-hypothetical large Lempel-Ziv models. Viewed that way, most modern training methods are extremely wasteful: Transformers, Mamba, RWKV, etc. are trading time for space to try to make relatively small models, and it's an expensive tradeoff.
From that perspective, we should expect somebody to eventually demonstrate that the Transformers paradigm sucks. Mamba and RWKV are good examples of modifying old ideas about RNNs to take advantage of GPUs, but are still stuck in the idea that having a GPU perform lots of gradient descent is good. If you want to critique something, critique the gradient worship!
I swear, it's like whenever Chinese folks do anything the rest of the blogosphere goes into panic. I'm not going to insult anybody directly but I'm so fucking tired of mathlessness.
Also, point of order: Meta open-sourced Llama so that their employees would stop using Bittorrent to leak it! Not to "keep the rabble quiet" but to appease their own developers.
I'm aware of that yeah, but it's not a field I'm actively engaged in atm and not likely to be any time soon either (from no desire to work in it follows no desire to wade through the pool of scum). but also not really the place to be looking for insight. it is the place wherein to ridicule the loons and boosters
been wondering whether that or the next winter will get here first.
did that a couple of years ago already, part of why I was already nice and burned out on so much of this nonsense when midjourney/stablediffusion started kicking around
[insert condensed comment about mentality of US/SFBA-influenced tech sector (and, really, it is US specifically; eurozone's a somewhat different beast), american exceptionalism, sinophobia, and too-fucking-many years of "founder" stories]
it really is tedious though, yeah. when it happens, I try to just avoid some feeds. limited spoons.
as you know, the bayfucker way (for getting on close to 20y now) is to get big piles of money and try to outspend your competition. why bother optimising or thinking about things if you can just throw another 87345243 computers at the problem? (I do still agree with you, but see above re desire and intent)
re the open source thing: it's a wider problem than just that, and admittedly I'm peeved about it from this larger scope. I didn't expound on it in my previous comment because (as above) largely not really the place. that said, soapbox:
there's a thing I've been noticing as a creeping trend lately. I call it "open source veneer", which is still a bit imprecise[0] but I think you'll get what I mean. it's the phenomenon of shit like this. of "projects" on github that are no more than a fancy readme and some "contributors" and whatnot, but no actual code (or ability to make full use of what is provided). of companies that build "open source" and then as soon as something (usually VC-/"earnings"-related decisions) happens, the entire project gets deeply buried (links disappear off main sites, leaving product/service only), actively hobbled ("oh you want to set this up yourself? glhf gfy", done in oh so many ways[1]), or often even entirely disappeared[2]
[0] - still working through the thought, should probably write about it soon
[1] - backend codebases lagging because "not feature priority", entirely missing documentation, wholly missing key sections of code which are "conveniently" left out, etc etc; examples off the top of my head: zotero, signal, firefox weave for a while. there's plenty more if you look
[2] - been noticing this especially frequently with some security stuff, but it's hardly the only example set