Technology

86649 readers

4051 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 3 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

463

OpenAI just admitted it can't identify AI-generated text. That's bad for the internet and it could be really bad for AI models. (www.businessinsider.com)

submitted 3 years ago by L4s@lemmy.world to c/technology@lemmy.world

106 comments fedilink hide all child comments

OpenAI just admitted it can't identify AI-generated text. That's bad for the internet and it could be really bad for AI models.::In January, OpenAI launched a system for identifying AI-generated text. This month, the company scrapped it.

you are viewing a single comment's thread
view the rest of the comments

[–] vrighter@discuss.tchncs.de 1 points 3 years ago (1 children)

They already do. where do you think the training corpus comes from? The real world. It's curated by humans and then fed to the ml system.

Problem is that the real world now has a bunch of text generated by ai. And it has been well studied that feeding that back into the training will destroy your model (because the networks would then effectively be trained to predict their own output, which just doesn't make sense)

So humans still need to filter that stuff out of the training corpus. But we can't detect which ones are real and which ones are fake. And neither can a machine. So there's no way to do this properly.

The data almost always comes from the real world, except now the real world also contains "harmful" (to ai) data that we can't figure out how to find and remove.

[–] volodymyr@lemmy.world 1 points 3 years ago

There are still people in between, building training data from their real world experices. Now digital world may become overwhelmed with AI creations, so training may lead to model collapse. So what if we give AI access to cameras, microphones, all that, and even let it articulate them. It would also need to be adventurous, searching for spaces away from other AI work. There is lot's of data in there which is not created by AI, although some point it might become so as well. I am living aside at the moment obvious dangers of this approach.