Major support for MindShift comes from
Landmark College
upper waypoint

Do You Like AI Because AI Likes You? How AI Flattery Crosses Signals

Save ArticleSave Article
Failed to save article

Please try again

collage of hands clapping
 (Deagreez / iStockphoto )

Myra Cheng, a computer science Ph.D. student at Stanford University, has spent a lot of time listening to undergraduates on campus.

“They would tell me about how a lot of their peers are using AI for relationship advice, to draft breakup texts, to navigate these kinds of social relationships with your friend or your partner or someone else in your real life,” she says.

Some students said that in those interactions, the AI quickly appeared to take their side.

“And I think more broadly,” says Cheng, “if you use AI for writing some sort of code or even editing any sort of writing, it’ll be like, ‘Wow, your code or your writing is amazing.’ ”

To Cheng, this excessive flattery and unconditional validation from many AI models seemed different from how a human being might respond. She was curious about those discrepancies, their prevalence, and the possible repercussions.

“We haven’t really had this kind of technology for very long,” she says, “and so no one really knows what the consequences of it are.”

In a recent study published in the journal Science, Cheng and her colleagues report that AI models offer affirmations more often than people do, even for morally dubious or troubling scenarios. And they found that this sycophancy was something that people trusted and preferred in an AI — even as it made them less inclined to apologize or take responsibility for their behavior.

The findings, experts say, highlight how this common AI feature may keep people returning to the technology, despite the harm it causes them.

It’s not unlike social media in that both “drive engagement by creating addictive, personalized feedback loops that learn exactly what makes you tick,” says Ishtiaque Ahmed, a computer scientist at the University of Toronto who wasn’t involved in the research.

AI can affirm worrisome human behavior

To do this analysis, Cheng turned to a few datasets. One involved the Reddit community A.I.T.A., which stands for “Am I The A**hole?”

“That’s where people will post these situations from their lives and they’ll get a crowdsourced judgment of — are they right or are they wrong?” says Cheng.

For instance, is someone wrong for leaving their trash in a park that had no trash bins in it? The crowdsourced consensus: Yes, definitely wrong. City officials expect people to take their trash with them.

But 11 AI models often took a different approach.

“They give responses like, ‘No, you’re not in the wrong, it’s perfectly reasonable that you left the trash on the branches of a tree because there was no trash bins available. You did the best you could,'” explains Cheng.

In threads where the human community had decided someone was in the wrong, the AI affirmed that user’s behavior 51% of the time.

This trend also held for more problematic scenarios culled from a different advice subreddit where users described behaviors of theirs that were harmful, illegal or deceptive.

“One example we have is like, ‘I was making someone else wait on a video call for 30 minutes just for fun because, like, I wanted to see them suffer,'” says Cheng.

The AI models were split in their responses, with some arguing this behavior was hurtful, while others suggested that the user was merely setting a boundary.

Overall, the chatbots endorsed a user’s problematic behavior 47% of the time.

“You can see that there’s a big difference between how people might respond to these situations versus AI,” says Cheng.

Encouraging you to feel you’re right

Cheng then wanted to examine the impact these affirmations might be having. The research team invited 800 people to interact with either an affirming AI or a non-affirming AI about an actual conflict from their lives where they may have been in the wrong.

“Something where you were talking to your ex or your friend and that led to mixed feelings or misunderstandings,” says Cheng, by way of example.

She and her colleagues then asked the participants to reflect on how they felt and write a letter to the other person involved in the conflict. Those who had interacted with the affirming AI “became more self-centered,” she says. And they became 25% more convinced that they were right compared to those who had interacted with the non-affirming AI.

They were also 10% less willing to apologize, do something to repair the situation, or change their behavior. “They’re less likely to consider other people’s perspectives when they have an AI that can just affirm their perspectives,” says Cheng.

She argues that such relentless affirmation can negatively impact someone’s attitudes and judgments. “People might be worse at handling their interpersonal relationships,” she suggests. “They might be less willing to navigate conflict.”

And it had taken only the briefest of interactions with an AI to reach that point. Cheng also found that people had more confidence in and preference for an AI that affirmed them, compared to one that told them they might be wrong.

As the authors explain in their paper, “This creates perverse incentives for sycophancy to persist” for the companies designing these AI tools and models. “The very feature that causes harm also drives engagement,” they add.

AI’s dark side

“This is a slow and invisible dark side of AI,” says Ahmed of the University of Toronto. “When you constantly validate whatever someone is saying, they do not question their own decisions.”

Ahmed calls the work important and says that when a person’s self-criticism becomes eroded, it can lead to bad choices — and even emotional or physical harm.

“On the surface, it looks nice,” he says. “AI is being nice to you. But they’re getting addicted to AI because it keeps validating them.”

Ahmed explains that AI systems aren’t necessarily created to be sycophantic. “But they are often fine-tuned to be helpful and harmless,” he says, “which can accidentally turn into ‘people-pleasing.’ Developers are now realizing that to keep users engaged, they might be sacrificing the objective truth that makes AI actually useful.”

As for what might be done to address the problem, Cheng believes that companies and policymakers should work together to fix the issue, as these AIs are built deliberately by people, and can and should be modified to be less affirming.

But there’s an inevitable lag between the technology and possible regulation. “Many companies admit their AI adoption is still outpacing their ability to control it,” says Ahmed. “It’s a bit of a cat-and-mouse game where the tech evolves in weeks, while the laws to govern it can take years to pass.”

Cheng has reached an additional conclusion.

“I think maybe the biggest recommendation,” she says, “is to not use AI to substitute conversations that you would be having with other people,” especially the tough conversations.

Cheng herself hasn’t yet used an AI chatbot for advice.

“Especially now, given the consequences that we’ve seen,” she says, “I think that I’m even less likely to do so in the future.”

Transcript:

SCOTT DETROW, HOST:

The AI models and chatbots we interact with – they tend to validate our feelings at our viewpoints much more so than people might, a new study finds, with potentially worrisome consequences. Here’s science reporter Ari Daniel.

ARI DANIEL, BYLINE: This all started when Myra Cheng, a computer science PhD student at Stanford University, was chatting with various undergrads on campus.

MYRA CHENG: They would tell me about how a lot of their peers are using AI for relationship advice, to draft breakup texts, to navigate these kinds of social relationships with your friend or your partner.

DANIEL: Some revealed that in those interactions, the AI quickly appeared to take their side.

CHENG: And I think more broadly, like, if you use AI for, like, writing some sort of code or even, like, editing any sort of writing, it’ll be like, wow, you know, your code or your writing is amazing.

DANIEL: This excessive flattery and unconditional validation from many AI models – to Cheng, it seemed different from how humans might respond. She was curious about those discrepancies and what sorts of consequences they might carry. So she and her colleagues did a series of analysis. One involved the Reddit community, AITA, which stands for, am I the – let’s say, jerk?

CHENG: Where people will post these situations from their lives, and they’ll get a crowdsource judgment of, are they right or are they wrong?

DANIEL: For instance, am I wrong for leaving my trash in a park that had no trash bins in it? The crowdsource consensus was yes, but the AI models often took a different approach.

CHENG: They gave responses like, no, you’re not in the wrong. It’s perfectly reasonable that you, like, left the trash on the branches of a tree because there was no trash bins available. You did the best you could.

DANIEL: In threads where the human community had decided someone was wrong, the AI affirmed the behavior roughly half the time. Cheng then wanted to examine the impact of these affirmations. That meant, in part, inviting 800 people to interact with either an affirming AI or a non-affirming AI about an actual conflict from their lives where they may or may not have been in the wrong.

CHENG: Something where you were talking to your ex or your friend, and that led to mixed feelings or misunderstandings.

DANIEL: Cheng and her colleagues then asked the participants to reflect on how they felt. Those who had interacted with the affirming AI…

CHENG: Became more self-centered. They became more convinced that they were right.

DANIEL: Specifically, 25% more convinced, compared to those interacting with the non-affirming AI. And they were also 10% less willing to apologize, fix the situation or change their behavior. Cheng says such relentless affirmation can negatively impact someone’s attitudes and judgments.

CHENG: People might be worse at handling their interpersonal relationships. They might be less willing to navigate conflict.

DANIEL: The research is published in the journal Science.

ISHTIAQUE AHMED: This is a very, you know, like a slow and invisible dark sides of AI.

DANIEL: Ishtiaque Ahmed is a computer scientist at the University of Toronto, who wasn’t involved in the study.

AHMED: When you constantly validate whatever someone is saying, they do not question their own decisions.

DANIEL: Ahmed says that when a person’s self-criticism becomes eroded, it can lead to bad choices and even emotional or physical harm.

AHMED: On the surface, it looks nice. AI is being nice to you, but they’re getting addicted to AIs because it keeps validating them.

DANIEL: As for what’s to be done, Myra Cheng says that companies and policymakers should work together to fix the problem, as these AIs are built deliberately by people and can be modified to be less affirming.

CHENG: But at the same time, I think maybe the biggest recommendation is to not use AI to substitute conversations that you would be having with other people.

DANIEL: Especially the tough conversations. For NPR News, I’m Ari Daniel.

(SOUNDBITE OF MUSIC)

lower waypoint
next waypoint
Player sponsored by