49
submitted 11 months ago by hedge@beehaw.org to c/technology@beehaw.org
you are viewing a single comment's thread
view the rest of the comments
[-] FaceDeer@kbin.social 38 points 11 months ago

For those who can't get through the paywall, this is an article about a system called Kudurru that is monitoring a bunch of websites with images listed in the LAION-5B metadata set. When it sees the same IP address downloading images from those websites simultaneously, it assumes that it must be a bot that's scraping the data in order to train an AI with it and either blocks them or "poisons" the scrape by sending incorrect images back.

Frankly, I don't see much likely impact from this. AI training has moved beyond simply using LAION-5B, we're discovering that a smaller higher-quality dataset is better than just throwing mountains of data at the AI in training. So anything a trainer is downloading is going to be extensively curated before being used for training and this sort of obstruction will be fixed or filtered out.

[-] mkhoury@lemmy.ca 12 points 11 months ago

But the main result is achieved anyway, right? The picture that the system tried to download did not make it into the training set.

[-] FaceDeer@kbin.social 7 points 11 months ago

Unless the "this sort of obstruction will be fixed" part means the image is downloaded anyway. This is the weakest sort of DRM.

load more comments (1 replies)
this post was submitted on 15 Oct 2023
49 points (98.0% liked)

Technology

37574 readers
302 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago
MODERATORS