systematic evaluation of medical vision-language models

paper

2025·arXiv·arxiv.org/html/2509.21979v1

Authors

Zikun Guo·Jingwei Lv·Xinyue Xu·Shu Yang·Jun Wen·Di Wang·Lijie Hu

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

This paper introduces a systematic benchmark for evaluating sycophancy and visual susceptibility in medical vision-language models, addressing critical patient safety concerns in AI deployment for medical workflows.

Paper Details

Citations

Year

2025

DOI:10.2139/ssrn.5283704

Metadata

arxiv preprintprimary source

Abstract

Visual language models (VLMs) have the potential to transform medical workflows. However, the deployment is limited by sycophancy. Despite this serious threat to patient safety, a systematic benchmark remains lacking. This paper addresses this gap by introducing a Medical benchmark that applies multiple templates to VLMs in a hierarchical medical visual question answering task. We find that current VLMs are highly susceptible to visual cues, with failure rates showing a correlation to model size or overall accuracy. we discover that perceived authority and user mimicry are powerful triggers, suggesting a bias mechanism independent of visual data. To overcome this, we propose a Visual Information Purification for Evidence based Responses (VIPER) strategy that proactively filters out non-evidence-based social cues, thereby reinforcing evidence based reasoning. VIPER reduces sycophancy while maintaining interpretability and consistently outperforms baseline methods, laying the necessary foundation for the robust and secure integration of VLMs.

Summary

This paper addresses the critical safety issue of sycophancy in medical vision-language models (VLMs) by introducing a systematic benchmark for evaluating their susceptibility to visual and social cues in medical visual question answering tasks. The authors find that current VLMs are highly vulnerable to non-evidence-based triggers like perceived authority and user mimicry, with failure rates correlating to model size. To mitigate this risk, they propose VIPER (Visual Information Purification for Evidence based Responses), a strategy that filters out social cues to reinforce evidence-based reasoning, demonstrating improved robustness while maintaining interpretability.

Cited by 1 page

Page	Type	Quality
Epistemic Sycophancy	Risk	60.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202695 KB

Benchmarking and Mitigate Psychological Sycophancy in Medical Vision-Language Models 
 
 
 

 
 

 
 
 
 
 Benchmarking and Mitigate Psychological Sycophancy in Medical Vision-Language Models

 
 
 
Zikun Guo 1 ,
Xinyue Xu 2 ,
Pei Xiang 4 ,
Shu Yang 3 ,
Xin Han 5 ,
Di Wang †,3 ,
Lijie Hu †,1 
 1 Mohamed bin Zayed University of Artificial Intelligence
 2 Hong Kong University of Science and Technology 
 3 King Abdullah University of Science and Technology 
 4 Xidian University 
 5 Zhejiang A&F University
 
 
 
 
 Abstract

 Vision language models(VLMs) are increasingly integrated into clinical workflows, but they often exhibit sycophantic behavior prioritizing alignment with user phrasing social cues or perceived authority over evidence based reasoning. This study evaluate clinical sycophancy in medical visual question answering through a novel clinically grounded benchmark. We propose a medical sycophancy dataset construct from PathVQA, SLAKE, and VQA-RAD stratified by different type organ system and modality. Using psychologically motivated pressure templates including various sycophancy. In our adversarial experiments on various VLMs, we found that these models are generally vulnerable, exhibiting significant variations in the occurrence of adversarial responses, with weak correlations to the model accuracy or size. Imitation and expert provided corrections were found to be the most effective triggers, suggesting that the models possess a bias mechanism independent of visual evidence. To address this, we propose Visual Information Purification for Evidence based Response (VIPER) a lightweight mitigation strategy that filters non evidentiary content for example social pressures and then generates constrained evidence first answers. This framework reduces sycophancy by an average amount outperforming baselines while maintaining interpretability. Our benchmark analysis and mitigation framework lay the groundwork for robust deployment of medical VLMs in real world clinician interactions emphasizing the need for evidence anchored defenses.

 
 
 
 1 Introduction

 
 Artificial intelligence has become integral to medical imaging and clinical decision support, with vision language models (VLMs) enabling instruction following interfaces over visual evidence  [ 24 , 18 ] . While aggregate accuracy on public benchmarks continues to improve  [ 17 , 13 , 29 ] , safe deployment in healthcare requires reliability under distributional and interactional shift. Such reliability includes calibrated confidence, principled abstention, controllability, and robustness to human factors arising in medical model dialogue  [ 14 ] . Absent these properties, nominal performance can mask brittleness that degrades clinical utility in routine workflows.

 
 
 This study examines sycophancy, defined as a systematic tendency of models to align with user phrasing, perceived authority, or social cues at the expense of task evidence. Operationally, sycophancy constitutes a social 

... (truncated, 95 KB total)

Resource ID: 7e220ec9cf1809b8 | Stable ID: sid_JuPr9ZQ2ck