technology

23873 readers
240 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 5 years ago
MODERATORS
26
27
28
29
10
submitted 1 week ago* (last edited 1 week ago) by kalabaza@hexbear.net to c/technology@hexbear.net
 
 

I'm watching all this for 1450 bucks (e: after taxes):

  • AMD Ryzen 7 9700X
  • AMD Radeon RX 7700XT 12GB
  • MSI B650 motherboard
  • Corsair 32GB DDR5 6000MHz
  • Corsair RMe 850W
  • 2TB NVMe
  • 360mm 320W liquid cooling

Is this a coherent build?

30
31
32
33
 
 

For someone who says she is fighting AI bot scrapers just in her free time, Xe Iaso seems to be putting up an impressive fight. Since she launched it in January, Anubis, a "program is designed to help protect the small internet from the endless storm of requests that flood in from AI companies," has been downloaded nearly 200,000 times, and is being used by notable organizations including GNOME, the popular open-source desktop environment for Linux, FFmpeg, the open-source software project for handling video and other media, and UNESCO, the United Nations organization for educations, science, and culture. 

Iaso decided to develop Anubis after discovering that her own Git server was struggling with AI scrapers, bots that crawl the web hoovering up anything that can be used for the training data that power AI models. Like many libraries, archives, and other small organizations, Iaso discovered her Git server was getting slammed only when it stopped working.  

"I wasn't able to load it in my browser. I thought, huh, that's strange," Iaso told me on a call. "So I looked at the logs and I figured out that it's restarted about 500 times in the last two days. So I looked in the access logs and I saw that [an] Amazon [bot] was clicking on every single link."

Iaso knew it was an Amazon bot because it self identified as such. She said she considered withdrawing the Git server from the open web but that because she wants to keep some of the source code hosted there open to the public, she tried to stop the Amazon bot instead. 

"I tried some things that I can't admit in a recorded environment. None of them worked. So I had a bad idea," she said. "I implemented some code. I put it up on GitHub in an experimental project dumping ground, and then the GNOME desktop environment started using it as a Hail Mary. And that's about when I knew that I had something on my hands."

There are several ways people and organizations are trying to stop bots at the moment. Historically, robots.txt, a file sites could use to tell automated tools not to scrape, was a respected and sufficient norm for this purpose, but since the generative AI boom, major AI companies as well as less established companies and even individuals, often ignored it. CAPTCHAs, the little tests users take to prove they're not a robot, aren't great, Iaso said, because some AI bot scrapers have CAPTCHA solvers built in. Some developers have created "infinite mazes" that send AI bot scrapers from useless link to useless link, diverting them from the actual sites humans use and wasting their time. Cloudflare, the ubiquitous internet infrastructure company, has created a similar "AI labyrinth" feature to trap bots. 

Iaso, who said she deals with some generative AI at her day job, told me that "from what I have learned, poisoning datasets doesn't work. It makes you feel good, but it ends up using more compute than you end up saving. I don't know the polite way to say this, but if you piss in an ocean, the ocean does not turn into piss."

In other words, Iaso thinks that it might be fun to mess with the AI bots that are trying to mess with the internet, but in many cases it's not practical to send them on these wild goose chases because it requires resources Cloudflare might have, but small organizations and individuals don't. 

"Anubis is an uncaptcha," Iaso explains on her site. "It uses features of your browser to automate a lot of the work that a CAPTCHA would, and right now the main implementation is by having it run a bunch of cryptographic math with JavaScript to prove that you can run JavaScript in a way that can be validated on the server."

Essentially, Anubis verifies that any visitor to a site is a human using a browser as opposed to a bot. One of the ways it does this is by making the browser do a type of cryptographic math with JavaScript or other subtle checks that browsers do by default but bots have to be explicitly programmed to do. This check is invisible to the user, and most browsers since 2022 are able to complete this test. In theory, bot scrapers could pretend to be users with browsers as well, but the additional computational cost of doing so on the scale of scraping the entire internet would be huge. This way, Anubis creates a computational cost that is prohibitively expensive for AI scrapers that are hitting millions and millions of sites, but marginal for an individual user who is just using the internet like a human. 

Anubis is free, open source, lightweight, can be self-hosted, and can be implemented almost anywhere. It also appears to be a pretty good solution for what we've repeatedly reported is a widespread problem across the internet, which helps explain its popularity. But Iaso is still putting a lot of work into improving it and adding features. She told me she's working on a non cryptographic challenge so it taxes users' CPUs less, and also thinking about a version that doesn't require JavaScript, which some privacy-minded disable in their browsers. 

The biggest challenge in developing Anubis, Iaso said, is finding the balance. 

"The balance between figuring out how to block things without people being blocked, without affecting too many people with false positives," she said. "And also making sure that the people running the bots can't figure out what pattern they're hitting, while also letting people that are caught in the web be able to figure out what pattern they're hitting, so that they can contact the organization and get help. So that's like, you know, the standard, impossible scenario."

Iaso has a Patreon and is also supported by sponsors on Github who use Anubis, but she said she still doesn't have enough financial support to develop it full time. She said that if she had the funding, she'd also hire one of the main contributors to the project. Ultimately, Anubis will always need more work because it is a never ending cat and mouse game between AI bot scrapers and the people trying to stop them. 

Iaso said she thinks AI companies follow her work, and that if they really want to stop her and Anubis they just need to distract her. 

"If you are working at an AI company, here's how you can sabotage Anubis development as easily and quickly as possible," she wrote on her site. "So first is quit your job, second is work for Square Enix, and third is make absolute banger stuff for Final Fantasy XIV. That's how you can sabotage this the best."

34
35
36
 
 

Basically I have a lot of friends who self describe as bad at tech. It seems like a lot of learned helplessness and refusing to even listen to instructions because they've already told themselves they can't do it. But they would like to get better and do trust me. So I was trying to come up with some "tasks" to give them to help them gain confidence and to gain some basic skills as well.

I have zero qualifications in tech/computer stuff, and no professional background either, so I know that all this stuff can be self-taught.

I was thinking gaming-related stuff might be a good entry point: setting up a Minecraft server, installing mods for games, hacking your 3DS. These things boil down to following instructions so maybe it would help people learn that if you follow the documentation/guide you will get things done. It doesn't require much thinking or problem-solving, just following instructions.

Would like to hear what other people think and what "tasks" they suggest tech illiterate or tech-averse people try in order to build their confidence and gain some basic competence.

37
38
 
 

It feels like 10 years ago, /r/cscareerquestions was full of people falling over each other to worship FAANG and their super high salaries. The tech field in general has always felt very full of chuds to me, or at the very best libertarians.

Maybe things are changing. This question was if the Big Beautiful bill would be good for software engineers.

Will Trumps big beautiful bill benefit software engineers?

Was reading up on the bill and came across this:

The bill would suspend the current amortization requirement for domestic R&D expenses and allow companies to fully deduct domestic research costs in the year incurred for tax years beginning January 1, 2025 and ending December 31, 2029.

That sounds fantastic for U.S based software engineers, am I reading that right?

Almost all of the answers are negative, with some even using a class analysis. One or two bad answers or course but still, if tech could gain some sort of class consciousness, and identify themselves with the working class instead of the petite bourgeois or labor aristocracy, there may be hope for them yet.

All the top answers I've seen so far:
mpaes98 says:

It will benefit software engineers ^at ^Palantir

Then all of the replies to this are insulting Palantir lol.

jarena009 says:

Well...US Corporate profits are currently up to $4T, and white collar/business professional jobs, especially in tech, are still down since 2023.

Meanwhile many of the major tech players are doing layoffs.

Do you think increased corporate profits, say to $4.4T or $4.6T, are going to result in more tech jobs?

Do you still believe in trickle down economics?

SenorSplashdamage says:

. And even if our wages went up as engineers, most of us still have family that will end up being impoverished by all the other effects, especially health care. The overall losses will exceed any gains in personal salaries.

randomuser194 says:

In theory will be beneficial in that way, you just have to ignore all of the negative factors to the overall economy because of the bill

Wallstreet says:

Wild to see the difference in this sub from just ~5 years ago to now.

Back then: People’s complaints about this sub was that a lot of people would post the 5 massive offers they received then they would just say: don’t compare yourself to these posts, you don’t have to grind leetcode for hours, 80k offer for a no name company is good enough

Vs now: this sub is just a bunch of posts about people struggling to find a job and now grinding leetcode is the norm, and if you’re not doing it, you’re the problem

mau5tron says:

No. Every major tech CEO sweet talked trump and threw a bunch of money at Trump's campaign with the promise to keep AI deregulated. Those tech companies are then going to keep dumping money into an unprofitable technology and call it an "R&D" expense, then lay off a bunch of engineers and still get their tax cut. And like clockwork, they'll buyback a bunch of stock to keep stock price at a steady level while the economy goes to shit. Trickle down economics has never worked bro. People are just hoarding at the top.

LeadVitamin13 says:

When companies and the rich save money they don't pass it on they hoard it. Its like thinking tax cuts will increase hiring when they don't. Maybe for a struggling company that need extra help but couldn't afford it not tech giants. If they can do a job with X amount of people why would you hire anymore just cause you got more money.

LeftcellInfiltrator says:

Yeah, it'll free up trillions for the booj to invest with. But you'll be programming robot jailers with the soul of Peter Tiel to whip Amazon indentured servants into being more productive instead of solving any real problems. This is already happening in research.

39
40
41
42
43
44
45
46
47
 
 

Instead of just generating the next response, it simulates entire conversation trees to find paths that achieve long-term goals.

How it works:

  • Generates multiple response candidates at each conversation state
  • Simulates how conversations might unfold down each branch (using the LLM to predict user responses)
  • Scores each trajectory on metrics like empathy, goal achievement, coherence
  • Uses MCTS with UCB1 to efficiently explore the most promising paths
  • Selects the response that leads to the best expected outcome

Limitations:

  • Scoring is done by the same LLM that generates responses
  • Branch pruning is naive - just threshold-based instead of something smarter like progressive widening
  • Memory usage grows with tree size, there currently no node recycling
48
49
50
view more: ‹ prev next ›