Strap in and start blasting the Depeche Mode.
TechTakes
Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.
This is not debate club. Unless it’s amusing debate.
For actually-good tech, you want our NotAwfulTech community
Dude discovers that one LLM model is not entirely shit at chess, spends time and tokens proving that other models are actually also not shit at chess.
The irony? He's comparing it against Stockfish, a computer chess engine. Computers playing chess at a superhuman level is a solved problem. LLMs have now slightly approached that level.
For one, gpt-3.5-turbo-instruct rarely suggests illegal moves,
Writeup https://dynomight.net/more-chess/
HN discussion https://news.ycombinator.com/item?id=42206817
Here are the results of these three models against Stockfish—a standard chess AI—on level 1, with a maximum of 0.01 seconds to make each move
I'm not a Chess person or familiar with Stockfish so take this with a grain of salt, but I found a few interesting things perusing the code / docs which I think makes useful context.
Skill Level
I assume "level" refers to Stockfish's Skill Level option.
If I mathed right, Stockfish roughly estimates Skill Level 1 to be around 1445 ELO (source). However it says "This Elo rating has been calibrated at a time control of 60s+0.6s" so it may be significantly lower here.
Skill Level affects the search depth (appears to use depth of 1 at Skill Level 1). It also enables MultiPV 4 to compute the four best principle variations and randomly pick from them (more randomly at lower skill levels).
Move Time & Hardware
This is all independent of move time. This author used a move time of 10 milliseconds (for stockfish, no mention on how much time the LLMs got). ... or at least they did if they accounted for the "Move Overhead" option defaulting to 10 milliseconds. If they left that at it's default then 10ms - 10ms = 0ms so 🤷♀️.
There is also no information about the hardware or number of threads they ran this one, which I feel is important information.
Evaluation Function
After the game was over, I calculated the score after each turn in “centipawns” where a pawn is worth 100 points, and ±1500 indicates a win or loss.
Stockfish's FAQ mentions that they have gone beyond centipawns for evaluating positions, because it's strong enough that material advantage is much less relevant than it used to be. I assume it doesn't really matter at level 1 with ~0 seconds to produce moves though.
Still since the author has Stockfish handy anyway, it'd be interesting to use it in it's not handicapped form to evaluate who won.
Interesting post and corresponding mastodon thread on the non-decentralised-ness of bluesky by cwebber.
https://dustycloud.org/blog/how-decentralized-is-bluesky/
https://social.coop/@cwebber/113527462572885698
The author is keen about this particular “vision statement”:
Preparing for the organization as a future adversary.
The assumption being, stuff gets enshittified and how might you guard your product against the future stupid and awful whims of management and investors?
Of course, they don’t consider that it cuts both ways, and Jack Dorsey’s personal grumbles about Twitter. The risk from his point of view was the company he founded doing evil unthinkable things like, uh, banning nazis. He’s keen for that sort of thing to never happen again on his platforms.
note that cwebber wrote the ActivityPub spec, btw
caption: """AI is itself significantly accelerating AI progress"""
wow I wonder how you came to that conclusion when the answers are written like a Fallout 4 dialogue tree
- "YES!!!"
- "Yes!!"
- "Yes."
- " (yes)"