systematic evaluation of medical vision-language models
paperAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
This paper introduces a systematic benchmark for evaluating sycophancy and visual susceptibility in medical vision-language models, addressing critical patient safety concerns in AI deployment for medical workflows.
Paper Details
Metadata
Abstract
Visual language models (VLMs) have the potential to transform medical workflows. However, the deployment is limited by sycophancy. Despite this serious threat to patient safety, a systematic benchmark remains lacking. This paper addresses this gap by introducing a Medical benchmark that applies multiple templates to VLMs in a hierarchical medical visual question answering task. We find that current VLMs are highly susceptible to visual cues, with failure rates showing a correlation to model size or overall accuracy. we discover that perceived authority and user mimicry are powerful triggers, suggesting a bias mechanism independent of visual data. To overcome this, we propose a Visual Information Purification for Evidence based Responses (VIPER) strategy that proactively filters out non-evidence-based social cues, thereby reinforcing evidence based reasoning. VIPER reduces sycophancy while maintaining interpretability and consistently outperforms baseline methods, laying the necessary foundation for the robust and secure integration of VLMs.
Summary
This paper addresses the critical safety issue of sycophancy in medical vision-language models (VLMs) by introducing a systematic benchmark for evaluating their susceptibility to visual and social cues in medical visual question answering tasks. The authors find that current VLMs are highly vulnerable to non-evidence-based triggers like perceived authority and user mimicry, with failure rates correlating to model size. To mitigate this risk, they propose VIPER (Visual Information Purification for Evidence based Responses), a strategy that filters out social cues to reinforce evidence-based reasoning, demonstrating improved robustness while maintaining interpretability.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Epistemic Sycophancy | Risk | 60.0 |
Cached Content Preview
[License: arXiv.org perpetual non-exclusive license](https://info.arxiv.org/help/license/index.html#licenses-available)
arXiv:2509.21979v1 \[cs.CV\] 26 Sep 2025
# Benchmarking and Mitigate Psychological Sycophancy in Medical Vision-Language Models
Report issue for preceding element
Zikun Guo1,
Xinyue Xu2,
Pei Xiang4,
Shu Yang3,
Xin Han5,
Di Wang†,3,
Lijie Hu†,1
1Mohamed bin Zayed University of Artificial Intelligence
2Hong Kong University of Science and Technology
3King Abdullah University of Science and Technology
4Xidian University
5Zhejiang A&F University
Report issue for preceding element
###### Abstract
Report issue for preceding element
Vision language models(VLMs) are increasingly integrated into clinical workflows, but they often exhibit sycophantic behavior prioritizing alignment with user phrasing social cues or perceived authority over evidence based reasoning. This study evaluate clinical sycophancy in medical visual question answering through a novel clinically grounded benchmark. We propose a medical sycophancy dataset construct from PathVQA, SLAKE, and VQA-RAD stratified by different type organ system and modality. Using psychologically motivated pressure templates including various sycophancy. In our adversarial experiments on various VLMs, we found that these models are generally vulnerable, exhibiting significant variations in the occurrence of adversarial responses, with weak correlations to the model accuracy or size. Imitation and expert provided corrections were found to be the most effective triggers, suggesting that the models possess a bias mechanism independent of visual evidence. To address this, we propose Visual Information Purification for Evidence based Response (VIPER) a lightweight mitigation strategy that filters non evidentiary content for example social pressures and then generates constrained evidence first answers. This framework reduces sycophancy by an average amount outperforming baselines while maintaining interpretability. Our benchmark analysis and mitigation framework lay the groundwork for robust deployment of medical VLMs in real world clinician interactions emphasizing the need for evidence anchored defenses.
Report issue for preceding element
## 1 Introduction
Report issue for preceding element
Artificial intelligence has become integral to medical imaging and clinical decision support, with vision language models (VLMs) enabling instruction following interfaces over visual evidence \[ [24](https://arxiv.org/html/2509.21979v1#bib.bib24 ""), [18](https://arxiv.org/html/2509.21979v1#bib.bib18 "")\]. While aggregate accuracy on public benchmarks continues to improve \[ [17](https://arxiv.org/html/2509.21979v1#bib.bib17 ""), [13](https://arxiv.org/html/2509.21979v1#bib.bib13 ""), [29](https://arxiv.org/html/2509.21979v1#bib.bib29 "")\], safe deployment in healthcare requires reliability under distributional and interactional shift. Such reliability includes calibrated confidence, princ
... (truncated, 98 KB total)7e220ec9cf1809b8 | Stable ID: OTc1MzA5OD