55
submitted 1 year ago by raptir to c/nostupidquestions@lemmy.world

We can generate AI images, we can generate AI text, but text in an image is a no go?

all 15 comments
sorted by: hot top controversial new old
[-] Sheeple@lemmy.world 48 points 1 year ago* (last edited 1 year ago)

Because AI doesn't fucking understand what it creates. It follows patterns and it shows intensely much when it tries to generate text. All it sees are "Patterned squiggles" and not processing words.

Did you expect the plagiarism machine to truly understand what it makes?

[-] 21Cabbage@lemmynsfw.com 8 points 1 year ago

I'm glad I'm not the only one who calls generative ai the plagiarism machine.

[-] Sheeple@lemmy.world 1 points 1 year ago

It's what it should be called and I make an effort to always refer to it as such as of lately

[-] Supermariofan67@programming.dev 5 points 1 year ago

It bothers me that a circlejerky and oversimplified answer like this one is more highly upvoted than the numerous high effort, highly detailed, and technical explanations for why it happens.

[-] foggy@lemmy.world 29 points 1 year ago

This happens for humans when they dream, too.

Basically, when recognizing or producing accurate text isn't the utility function... You don't get accurate text.

Welcome to the matrix or something.

[-] Sheeple@lemmy.world 3 points 1 year ago

I noticed that I find writing in dreams exceedingly hard. "reading" is easy because I never truly read it but rather so just so happen to know what a word says without analysing it.

Once I have to write the word it becomes hell however as I can't seem to manage to make anything coherent with things morphing around constantly.

[-] Swedneck@discuss.tchncs.de 2 points 1 year ago

interesting, i only ever remember reading in my dreams and several times now i've had dreams where i recognize that text changes every time i read it and is just vaguely correct sounding nonsense like "blueab smolbob eat blitsfop", and the reason i realize this is specifically because it's frustrating to try to read something and it.. doesn't work??

[-] jacksilver@lemmy.world 2 points 1 year ago

To add to this, the way the AI is trained is that you pass in images with descriptions (for the most part). Since most descriptions focus on the main concepts, it generally won't have the actual text included in the descriptions. Without the being included in the descriptions, the AI will have a hard time learning the meaning of the squiggles in the images. In addition those squiggles can represent a lot of different things, so even if it grows to "understand" letters, it's really hard to "understand" their meaning; thus leading to a lot of weird words/text.

[-] Swedneck@discuss.tchncs.de 1 points 1 year ago

it's pretty fun to look at how they almost get it right in some cases, like if you prompt "birthday" you might get some text that almost looks like "happy birthday" followed by a smudge that is supposed to be a name, but also probably some actually correct numbers because those are much more predictable!

[-] gerryflap@feddit.nl 16 points 1 year ago* (last edited 1 year ago)

Generating meaningful text in an image is very complex. Most of these models like Dall-E and simple diffusion are essentially guided denoising algorithms. They get images of pure noise, and are being told that it's actually just a very noisy image of whatever the description is. So all they do is remove some noise for many steps in a row until a clear image emerges. You can kinda imagine it as the "AI" staring into the noise to see the image that you described.

Most real-world objects are of course quite complex. If it sees a tree branch in the noise, it also need to make sure that the rest of the tree fits. And a car headlight only makes sense if the rest of the car is also there. But for text these kind of correlations are even way way harder. In order to generate meaningful text it not only needs to understand how text is usually spaced, and that letters usually are written in a consistent font, it also needs to learn the entire English language. All that just to generate something that is probably overall of less influence to it's "score" on images form the dataaset than learning how to draw a realistic car.

So in order to generate meaningful text, the model requires a lot of capacity. Otherwise, since it's not specifically motivated to learn to write meaningful text, it'll do whatever it's doing now. Honestly I'm sometimes quite impressed with how well these models do generate text, given all these considerations.

EDIT: Another few things came to mind:

  • Relating images and text (and thus guiding the image generator) was in the past done using a different (AI) model. Not sure if that's still the case. So 2 models need to understand the English language to generate meaningful text: generator and the image to text translation model.

  • So why can AI like ChatGPT generate meaningful text? Well in short, they are fully dedicated to outputting language. They output the text as text and thus can be easily scored on it. The neural network architecture is also way more suited to it and they see way more text

[-] Iamdanno@lemmynsfw.com 1 points 1 year ago

So all they do is remove some noise for many steps in a row until a clear image emerges.

So it's like Mark Twain? said, writing is easy, all you do is write everything down, then cross out all the wrong words. Or something to that effect.

The best answer will require a very technical understanding, but I'll give it a try and stay abstract.

The AI is trained using images. If you type in things like "a tree" it has a vague idea of what it looks like.

The thing is writing letters is a hard concept. How should the AI know text is made up of letters? Connected lines make a letter and unconnected ones don't. Sentences are connected using dots.

Easy enough for us, you have to imagine an AI is best with what it can directly observe. But knowing when to literally write out letters is hard. So it has a stroke. It has a vague notion of "this is where text is supposed to go" but making the letters look right in an adjusted font, remembering where letters end and how words are spaced; all of this is far too complex.

Now I haven't looked into it for AIs who CAN generate text more well, but I assume the only they do this is by deciding "there's gonna be text" and then using another process to insert the text basically after the fact. Or maybe there's some special process change in the training or inference of the image going on? Idk, for this one I need an expert.

[-] wetferret@lemmy.world -2 points 1 year ago

I don't understand why image generators can't just make a quick call to a chatGPT API? It's incredibly competent at producing convincing text.

this post was submitted on 22 Dec 2023
55 points (93.7% liked)

No Stupid Questions

36057 readers
729 users here now

No such thing. Ask away!

!nostupidquestions is a community dedicated to being helpful and answering each others' questions on various topics.

The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:

Rules (interactive)


Rule 1- All posts must be legitimate questions. All post titles must include a question.

All posts must be legitimate questions, and all post titles must include a question. Questions that are joke or trolling questions, memes, song lyrics as title, etc. are not allowed here. See Rule 6 for all exceptions.



Rule 2- Your question subject cannot be illegal or NSFW material.

Your question subject cannot be illegal or NSFW material. You will be warned first, banned second.



Rule 3- Do not seek mental, medical and professional help here.

Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.



Rule 4- No self promotion or upvote-farming of any kind.

That's it.



Rule 5- No baiting or sealioning or promoting an agenda.

Questions which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.



Rule 6- Regarding META posts and joke questions.

Provided it is about the community itself, you may post non-question posts using the [META] tag on your post title.

On fridays, you are allowed to post meme and troll questions, on the condition that it's in text format only, and conforms with our other rules. These posts MUST include the [NSQ Friday] tag in their title.

If you post a serious question on friday and are looking only for legitimate answers, then please include the [Serious] tag on your post. Irrelevant replies will then be removed by moderators.



Rule 7- You can't intentionally annoy, mock, or harass other members.

If you intentionally annoy, mock, harass, or discriminate against any individual member, you will be removed.

Likewise, if you are a member, sympathiser or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people, and you were provably vocal about your hate, then you will be banned on sight.



Rule 8- All comments should try to stay relevant to their parent content.



Rule 9- Reposts from other platforms are not allowed.

Let everyone have their own content.



Rule 10- Majority of bots aren't allowed to participate here.



Credits

Our breathtaking icon was bestowed upon us by @Cevilia!

The greatest banner of all time: by @TheOneWithTheHair!

founded 2 years ago
MODERATORS