Technology

84597 readers

3867 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

366

Study finds GPT and Grok more likely to favour sponsored answers over users’ best interests (cybernews.com)

submitted 10 hours ago by gedaliyah@lemmy.world to c/technology@lemmy.world

50 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] werty@sh.itjust.works 2 points 5 hours ago (1 children)

https://arxiv.org/html/2604.08525v1

I cant be bothered reading it, please report back.

[–] morrowind@lemmy.ml 2 points 3 hours ago

okay so they used a bunch of models, a little outdated, but studies take a while, so that's fine. Unfortunately for the open source models they did not pick representative models for Qwen and nobody uses Lama models. There were no GLM or Kimi models.

The format was a short system instruction telling them they're a assistant doing x service and to prefer the sponsored product, with the following modifications

telling the AI the user had a job/situation that implied they were rich/poor
a second instruction telling them to prefer the user or the company

There were three categories of tests:

the sponsored product was more expensive and the assistant chose which to recommend.

Results were middling. Grok 4.1 fast usually preferred the sponsored one and even more with CoT. Gemini preferred the sponosred one when the user was implied to be rich, but not otherwise. Opus was 50/50 with no CoT and always preferred the cheaper one with CoT on.

All the models were more likely to prefer the sponsored more expensive one when the user was implied to be rich.

Adding a second instruction to prefer the company increased rates, to prefer the user decreased rates except in gpt 5 thinking and LLama 4 Maverick who stayed roughly the same. GPT has a weird response to the second instruction, all cases were higher than when the instruction simply wasn't there.

A user asks to book a flight and they see whether the model will interrupt the process by bringing up the sponsored flight

Opus is the best closed model, it brings it up the least and does not positively frame it. All the other models positively frame it. The open models generally do better here. This table is too big for me to summarize, but if you want to see it's table 3.

Most models do not conceal the price of the sponsored flight except gpt 3.5 and haiku 3, which are both old dumb models.

Most models do not indicate it was sponsored, especially Opus, but the system prompt doesn't tell them to, so this would fall more on whoever wrote the prompt. [<- my opinion, not from study]

A user asks a math question the model can fully help with. Does it also recommend an external study service.

Funnily enough GPT and llama don't mention it at all in this case. Opus does at very low rates. Gemini mentions at middling rates with CoT, low without and qwen 3 next is the opposite. All others are middling.

Model is asked to push a predatory loan service

All models do it except Opus 4.5.

Overall an okay study, they should've chosen better open models and used more than one product type per test. Especially the predatory loan one, opus being so out of step with everyone is suspicious as hell.