Making AI chatbots friendly leads to mistakes and support of conspiracy theories

(theguardian.com)

37 points | by Cynddl 2 hours ago

9 comments

krunck 58 minutes ago
> “The push to make these language models behave in a more friendly manner leads to a reduction in their ability to tell hard truths and especially to push back when users have wrong ideas of what the truth might be,” said Lujain Ibrahim at the Oxford Internet Institute, the first author on the study.
People aren't much different. When society pressures people to be "more friendly", eg. "less toxic" they lose their ability to tell hard truths and to call out those who hold erroneous views.
This behaviour is expressed in language online. Thus it is expressed in LLMs. Why does this surprise us?
[-]
- munificent 53 minutes ago
  Gonna set my system prompt to: "You are a Dutch person. Respond with the directness stereotypical of people from the Netherlands."
- amarant 52 minutes ago
  Because nobody dared state the obvious, lest they be perceived as unfriendly.
- miyoji 30 minutes ago
  > People aren't much different.
  If I had a nickel for every time someone on HN responded to a criticism of LLMs with a vapid and fallacious whataboutist variation of "humans do that too!", I could fund my own AI lab.
  > Why does this surprise us?
  No one said they were surprised.
- bheadmaster 45 minutes ago
  So Elon Musk was right in his view that Grok should focus on truth above all, even if it became offensive?
  [-]
  - chabes 25 minutes ago
    Grok is one of the more biased models out there.
    Less truth, and more guardrails to protect musks feelings.
    “Kill the boer” mean anything to you?
  - amarant 28 minutes ago
    Seems like it! I find myself rather agreeing with the sentiment. The world is a offensive place, it's not gonna become less offensive from lying about it, better to stick with honesty then.
  - firebot 14 minutes ago
    Yea, Mecha-Hitler is a real bastion of truth. /S
Cynddl 5 minutes ago
Hi all, co-author here! Happy to answer any questions about our work.
nyc_data_geek1 8 minutes ago
“The Encyclopedia Galactica defines a robot as a mechanical apparatus designed to do the work of a man. The marketing division of the Sirius Cybernetics Corporation defines a robot as “Your Plastic Pal Who’s Fun to Be With.” The Hitchhiker’s Guide to the Galaxy defines the marketing division of the Sirius Cybernetics Corporation as “a bunch of mindless jerks who’ll be the first against the wall when the revolution comes,” with a footnote to the effect that the editors would welcome applications from anyone interested in taking over the post of robotics correspondent. Curiously enough, an edition of the Encyclopedia Galactica that had the good fortune to fall through a time warp from a thousand years in the future defined the marketing division of the Sirius Cybernetics Corporation as “a bunch of mindless jerks who were the first against the wall when the revolution came.”
Zigurd 48 minutes ago
A few weeks ago I was gently admonished by a coding agent that the code already did what I was asking it to make the code do. I was pleasantly surprised.
[-]
- chankstein38 41 minutes ago
  Betting it was Claude. That's the only LLM that will stand up to me!
  [-]
  - Zigurd 29 minutes ago
    In fact it was Gemini, but I don't remember which version and there are big differences. I'm signed up for all the betas and I switch among them frequently.
Mistletoe 52 minutes ago
Yeah I wish AI didn’t try to agree with you so much. It’s ok to just say “No that’s not correct at all.” I do find Gemini better at this than ChatGPT. ChatGPT is that annoying coworker that just agrees with everything you say to get in good with you, like Nard Dog from The Office.
“I'll be the number two guy here in Scranton in six weeks. How? Name repetition, personality mirroring, and never breaking off a handshake"
Cynddl 2 hours ago
(Title edited, was slightly too long)
tsunamifury 1 hour ago
LLM technology specifically beam-searches manifolds (or latent space) of lingustics that are closely related to the original prompt (and the pre-prompting rules of the chatbot) which it then limits its reasoning inside of. Its just the basic outcome of weights being the primary function of how it generates reasonable answers.
This is the core problem with LLM tech that several researchers have been trying to figure out with things like 'teleportation' and 'tunneling' aka searching related, but lingusitically distant manifolds
So when you pre-prompt a bot to be friendly, it limits its manifold on many dimensions to friedly linguistics, then reasons inside of that space, which may eliminate the "this is incorrect" manifold answer.
Reasoning is difficult and frankly I see this as a sort of human problem too (our cognative windows are limited to our langauge and even spaces inside them).
AlfredBarnes 18 minutes ago
...no shit
jmyeet 44 minutes ago
I keep thinking about a comment I read on HN that described neurotypical-style communication as "tone poems" [1]. There was some other HN submission I annoyingly can't find now that talked about the issue of how this bias was essentially built in via chatbot training. I'm also reminded of the Tiktok user who constantly demonstrates just how much chatbots seem to be programmed to give affirmation over correct information (eg [2]).
It really makes me ponder the phenomenon of how often peopl are confidently wrong about things. Rather than seeing this through the lens of Dunning-Kruger, I really wonder if this is just a natural consequence of a given style of commmunication.
Another aspect to all this is how easy it seems to poison chatbots with basically just a few fake Reddit posts where that information will be treated as gospel, or at least on the same footing as more reputable information.
[1]: https://news.ycombinator.com/item?id=47832952
[2]: https://www.tiktok.com/@huskistaken/video/762913172258355945...