this post was submitted on 10 Apr 2026
75 points (100.0% liked)

Slop.

832 readers
585 users here now

For posting all the anonymous reactionary bullshit that you can't post anywhere else.

Rule 1: All posts must include links to the subject matter, and no identifying information should be redacted.

Rule 2: If your source is a reactionary website, please use archive.is instead of linking directly.

Rule 3: No sectarianism.

Rule 4: TERF/SWERFs Not Welcome

Rule 5: No bigotry of any kind, including ironic bigotry.

Rule 6: Do not post fellow hexbears.

Rule 7: Do not individually target federated instances' admins or moderators.

founded 1 year ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[โ€“] chgxvjh@hexbear.net 30 points 5 days ago* (last edited 5 days ago) (1 children)

The LLMs might also just pretend to process the files but actually just make shit up.

Here someone asked LLMs some questions about files but didn't actually include some files but merely filenames/paths that don't actually exist. https://neuromatch.social/@jonny/116373289181802627 With potentially disastrous results if people actually rely on it for vision.

[โ€“] plinky@hexbear.net 19 points 4 days ago* (last edited 4 days ago)

they don't pretend (stop anthropomorphizing smh), they just fail to trigger engine of image processing, and they don't have internal simple scripts checking for the engagement of such engine. for the encoder there is no difference between text input containing files or not, it follows then it will provide an answer regardless. simple script checking, on the other hand, cannot check whether the text output matches the ingested data, so they just fumble around.

it has to have very defined scripts of how to behave with outside data is what i'm saying, it can reply regardless of that, best you can hope for is that reply is some ways relates to other part of machine ("agent" or whatever image/file/video/audio recognition thingy sits there) (like obvious sane way is - launch one shot image descriptor, get it's output, on fail - send description to llm there were no image, train the damn thing to synthesize both contexts in "sorry mate, no inputs")