this post was submitted on 07 Nov 2025

251 points (98.5% liked)

Fuck AI

4551 readers

1140 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 2 years ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world

Communist@lemmy.ml

eatCasserole@lemmy.world

SpaceNoodle@lemmy.world

NutWrench@lemmy.world

Soup@lemmy.cafe

iAvicenna@lemmy.world

Tinks@lemmy.world

wizblizz@lemmy.world

corus_kt@lemmy.world

Prandom_returns@lemm.ee

JimSamtanko@lemm.ee

TrickDacy@lemmy.world

TheFriar@lemm.ee

ArmokGoB@lemmy.dbzer0.com

HawlSera@lemm.ee

andrew_bidlaw@sh.itjust.works

MeDuViNoX@sh.itjust.works

33550336@lemmy.world

Nougat@fedia.io

Lost_My_Mind@lemmy.world

Sterile_Technique@lemmy.world

Quill7513@slrpnk.net

glowing_hans@sopuli.xyz

e8d79@discuss.tchncs.de

ThefuzzyFurryComrade@pawb.social

251

AI gets 45% of news wrong — but readers still trust it (pivot-to-ai.com)

submitted 1 week ago by technocrit@lemmy.dbzer0.com to c/fuck_ai@lemmy.world

12 comments fedilink hide all child comments

The BBC and the European Broadcasting Union have produced a large study of how well AI chatbots handle summarising the news. In short: badly.

The researchers asked ChatGPT, Copilot, Gemini, and Perplexity about current events. 45% of the chatbot answers had at least one major issue. 31% were seriously wrong and 20% had major inaccuracies, from hallucinations or outdated sources. This is across multiple languages and multiple countries.

The AI distortions are “significant and systemic in nature.”

you are viewing a single comment's thread
view the rest of the comments

[–] mudkip -5 points 1 week ago (2 children)

This was a very poorly conducted study. Every single tester was a journalist from the very companies losing traffic to AI. They had a direct stake in making the results look bad. If you dig into the actual report, you see how they get the numbers. Most of the errors are "sourcing issues": the AI assistant doesn't cite a claim, or it (shocking) cites Wikipedia instead of the BBC.

Also, the models are heavily outdated (4o for GPT, Flash for Gemini, which aren't even equivalent in intelligence). They don't list the full model versions from what I can tell.

[–] RedstoneValley@sh.itjust.works 11 points 6 days ago* (last edited 6 days ago)

You might want to read the actual report then.

You'll find that the second study was conducted in May/June 2025 and you'll find the model versions, which were the available free options at the time (page 20)

Also the sourcing errors found where not based on the question which source was selected (aka a bias in sourcing as you seem to imply) but the report explicitly states this:

Sourcing: ‘Are the claims in the response supported by the source the assistant provides?’ (page 9)

"Sourcing was the biggest cause of problems, with 31% of all responses having significant issues with sourcing – this includes information in the response not supported by the cited source, providing no sources at all, or making incorrect or unverifiable sourcing claims." (page 10)

GPT 4o and Gemini Flash were not "heavily outdated" at the time when the study was conducted, because these were the provided models in the free version which they used (page 20 and page 62).

The goal of the study is not to find the best performing model or to compare the performance of different models, but to use the publicly available AI offerings like a normal consumer would be able to. You might get better results by using a paid pro model or a specialized model of some kind but that's not the point here.

[–] fistac0rpse@fedia.io -3 points 1 week ago (1 children)

the study was probably conducted by AI

[–] logi@piefed.world 1 points 1 week ago

In which case we're supposed to ignore all the problems with it