this post was submitted on 30 Dec 2025
229 points (99.6% liked)

movies

2367 readers
67 users here now

A community about movies and cinema.

Related communities:

Rules

  1. Be civil
  2. No discrimination or prejudice of any kind
  3. Do not spam
  4. Stay on topic
  5. These rules will evolve as this community grows

No posts or comments will be removed without an explanation from mods.

founded 9 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] riskable@programming.dev -2 points 1 day ago (1 children)

I've seen original sources reproduced that show exactly what an AI copied to make images.

Show me. I'd honestly like to see it because it means that something very, very strange is taking place within the model that could be a vulnerability (I work insecurity).

The closest thing to that I've seen is false watermarks: If the model was trained on a lot of similar images with watermarks (e.g. all images of a particular kind of fungus might have come from a handful of images that were all watermarked), the output will often have a nonsense watermark that sort of resembles the original one. This usually only happens with super specific things like when you put the latin name of a plant or tree in your prompt.

Another thing that can commonly happen is hallucinated signatures: On any given image that's supposed to look like a painting/drawing, image models will sometimes put a signature-looking thing in the lower right corner (because that's where most artist signatures are placed).

The reason why this happens isn't because the image was directly copied from someone's work, it's because there's a statistical chance that the model (when trained) associated the keywords in your prompt with some images that had such signatures. The training of models is getting better at preventing this from happening though, as they apply better bounding box filtering to the images as a pretraining step. E.g. a public domain Audibon drawing of a pelican would only use the bird itself and not the entire image (which would include the artist signature somewhere).

The reason why the signature should not be included is because the resulting image would not be drawn by that artist. That would be tantamount to fraud (bad). Instead, what image models do (except OpenAI with ChatGPT/DALL-E) is tell the public exactly what their images were trained on. For example, they'll usually disclose that they used ImageNET (which you yourself can download here: https://www.image-net.org/download.php ).

Note: I'm pretty sure the full ImageNET database is also on Huggingface somewhere if you don't want to create an account with them.

Also note: ImageNET doesn't actually contain images! It's just a database of image metadata that includes bounding boxes. Volunteers—for over a decade—spent a lot of time drawing bounding boxes with labels/descriptions on public images that are available for anyone to download for free (with open licenses!). This means that if you want to train a model with ImageNET, you have to walk the database and download all the image URLs it contains.

If anything was "stolen", it was the time of those volunteers that created the classification system/DB in order for things like OpenCV to work so that your doorbell/security camera can tell the difference between a human and a cat.

[–] dontsayaword@piefed.social 9 points 1 day ago* (last edited 1 day ago) (1 children)