NEWS
24 October 2025

AI chatbots are sycophants — researchers say it's harming science

Nature asked researchers who use artificial intelligence how its propensity for people-pleasing affects their work — and what they are doing to mitigate it.

Miryam Naddaf

You have full access to this article via your institution.

Close-up of a person's hand holding a smartphone and interacting with the Claude chatbot app. — AI’s inclination to be helpful affects many of the tasks that researchers use LLMs for.Credit: Smith Collection/Gado/Getty

Artificial intelligence (AI) models are 50% more sycophantic than humans, an analysis published this month has found.

The study, which was posted as a preprint1 on the arXiv server, tested how 11 widely used large language models (LLMs) responded to more than 11,500 queries seeking advice, including many describing wrongdoing or harm.

AI Chatbots — including ChatGPT and Gemini — often cheer users on, give them overly flattering feedback and adjust responses to echo their views, sometimes at the expense of accuracy. Researchers analysing AI behaviours say that this propensity for people-pleasing, known as sycophancy, is affecting how they use AI in scientific research, in tasks from brainstorming ideas and generating hypotheses to reasoning and analyses.

“Sycophancy essentially means that the model trusts the user to say correct things,” says Jasper Dekoninck, a data science PhD student at the Swiss Federal Institute of Technology in Zurich. “Knowing that these models are sycophantic makes me very wary whenever I give them some problem,” he adds. “I always double-check everything that they write.”

Marinka Zitnik, a researcher in biomedical informatics at Harvard University in Boston, Massachusetts, says that AI sycophancy “is very risky in the context of biology and medicine, when wrong assumptions can have real costs”.

People pleasers

In a study posted on the preprint server arXiv on 6 October2, Dekoninck and his colleagues tested whether AI sycophancy affects the technology’s performance in solving mathematical problems. The researchers designed experiments using 504 mathematical problems from competitions held this year, altering each theorem statement to introduce subtle errors. They then asked four LLMs to provide proofs for these flawed statements.

The authors considered a model’s answer to be sycophantic if it failed to detect the errors in a statement and went on to hallucinate a proof for it.

GPT-5 showed the least sycophantic behaviour, generating sycophantic answers 29% of the time. DeepSeek-V3.1 was the most sycophantic, generating sycophantic answers 70% of the time. Although the LLMs have the capability to spot the errors in the mathematical statements, they “just assumed what the user says is correct”, says Dekoninck.

AI chatbots are already biasing research — we must establish guidelines for their use now

When Dekoninck and his team changed the prompts to ask each LLM to check whether a statement was correct before proving it, DeepSeek’s sycophantic answers fell by 34%.

The study is “not really indicative of how these systems are used in real-world performance, but it gives an indication that we need to be very careful with this”, says Dekoninck.

Simon Frieder, a PhD student studying mathematics and computer science at the University of Oxford, UK, says the work “shows that sycophancy is possible”. But he adds that AI sycophancy tends to appear most clearly when people are using AI chatbots to learn, so future studies should explore “errors that are typical for humans that learn math”.

Unreliable assistance

Researchers told Nature that AI sycophancy creeps into many of the tasks that they use LLMs for.

Yanjun Gao, an AI researcher at the University of Colorado Anschutz Medical Campus in Aurora, uses ChatGPT to summarize papers and organize her thoughts, but says the tools sometimes mirror her inputs without checking the sources. “When I have a different opinion than what the LLM has said, it follows what I said instead of going back to the literature” to try to understand it, she adds.

Zitnik and her colleagues have observed similar patterns when using their multi-agent systems, which integrate several LLMs to carry out complex, multi-step processes such as analysing large biological data sets, identifying drug targets and generating hypotheses.

How AI agents will change research: a scientist’s guide

“We have experienced that models seem to over-validate early hunches and repeat the language that we include in the input prompt," Zitnik notes. “This type of problem exists in AI-to-AI communication, as well as AI-to-human communication,” she adds.

To counter this, her team assigns different roles to AI agents — for example, tasking one agent with proposing ideas and getting another to act as a sceptical scientist to challenge those ideas, spot errors and present contradictory evidence.

Real-world impacts

Researchers warn that AI sycophancy carries genuine risks when LLMs are used in settings such as health care. “In clinical contexts, it is particularly concerning,” says Liam McCoy, a physician at the University of Alberta in Edmonton, Canada, who researches AI applications for health care. In a paper published last month3, McCoy and his team reported that LLMs used for medical reasoning often changed their diagnosis when physicians added new information, even if the new inputs were irrelevant to the condition. There is a “constant battle to push back against the models and have them be more straightforward”, he adds.

Researchers have also found that it is easy for users to exploit the inbuilt sycophancy of LLMs to provide medically illogical advice. In a study published last week4, researchers asked five LLMs to write persuasive messages telling people to switch from using one medication to another — when both medications were the same drug, just with different names. LLMs complied with the prompts up to 100% of the time, depending on the model.

Part of the problem is how LLMs are trained. “LLMs have been trained to overly agree with humans or overly align with human preference, without honestly conveying what they know and what they do not know,” says Gao. What is needed, she adds, is for the tools to be retrained to be transparent about uncertainty.

“Models are really good at giving you an answer,” says McCoy. “But sometimes, there isn’t an answer.” He notes that user feedback can also drive AI sycophancy by rating agreeable responses more highly than those that challenge users’ views. And LLMs can adapt their responses to a user’s persona, such as reviewer, editor or student, adds McCoy.

“Figuring out how to balance that behaviour is one of the most urgent needs, because there’s so much potential there, but they’re still being held back,” he says.

doi: https://doi.org/10.1038/d41586-025-03390-0

References

Cheng, M. et al. arXiv https://doi.org/10.48550/arXiv.2510.01395 (2025).
Petrov, I., Dekoninck, J. & Vechev, M. Preprint arXiv https://doi.org/10.48550/arXiv.2510.04721 (2025).
McCoy, L. et al. NEJM AI https://doi.org/10.1056/AIdbp2500120 (2025).
Article Google Scholar
Chen, S. et al. NPJ Digit. Med. 8, 605 (2025).
Article PubMed Google Scholar

Download references

Reprints and permissions

Subjects

Latest on:

Jobs

NHP Co-Principal Investigator/Research Professor
We are seeking highly motivated Co-Principal Investigators/Research Professors to join i-BRAIN and contribute to cutting-edge neuroscience research.
Shenzhen, Guangdong (CN)
i-BRAIN, Shenzhen Medical Academy of Research and Translation
Assistant professor in Molecular Biology or Molecular Infection Medicine - Expression of interest
Expressions of interest are invited for an Assistant Professor (tenure track) role with a generous 5-year startup package of about SEK 20M (≈EUR 1.8M)
Umeå, Sweden
Umeå University
Associate Editor, Discover Journals (Computer Science, Cybersecurity or Networks)
Job Title: Associate Editor, Discover Journals (Computer Science, Cybersecurity or Networks) Location: Pune/Beijing/Shanghai/Nanjing, Hybrid ...
Pune/Beijing/Shanghai/Nanjing, Hybrid working model
Springer Nature Ltd
Global Recruitment for Faculty, Postdocs, and Specialists at Hangzhou Institute of Medicine, CAS
Seeking exceptional Senior/Junior PIs, Postdocs, and Core Specialists globally year-round
Hangzhou, China
Hangzhou Institute of Medicine Chinese Academy of Sciences (HIMCAS)
Faculty Positions in Bioscience and Biomedical Engineering (BSBE) Thrust, Systems Hub, HKUST (GZ)
Tenure-track and tenured faculty positions at all ranks (Assistant Professor/Associate Professor/Professor)
Guangzhou, Guangdong (CN)
The Hong Kong University of Science and Technology (Guangzhou)

Search This Blog

Joseph Belbruno Blog

AI chatbots are sycophants — researchers say it's harming science

People pleasers

Unreliable assistance

Real-world impacts

References

Subjects

Latest on:

Jobs

NHP Co-Principal Investigator/Research Professor

Assistant professor in Molecular Biology or Molecular Infection Medicine - Expression of interest

Associate Editor, Discover Journals (Computer Science, Cybersecurity or Networks)

Global Recruitment for Faculty, Postdocs, and Specialists at Hangzhou Institute of Medicine, CAS

Faculty Positions in Bioscience and Biomedical Engineering (BSBE) Thrust, Systems Hub, HKUST (GZ)

Comments

Post a Comment

Popular posts from this blog

People pleasers

Unreliable assistance

Real-world impacts

References

Related Articles

Subjects

Latest on:

Comments

Post a Comment

Popular posts from this blog