Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering
Table of Contents
Title: Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering
Authors: Danfeng Guo, Demetri Terzopoulos
Published: Jul 31 2024
Link: https://arxiv.org/abs/2407.21368
Summary:
- Authors introduced two prompting methods to reduce hallucinations on medical large vision-language model (LVLMs) and improve visual question answering (VQA) tasks:
- Provide detailed pathological descriptions to the question queries.
- Introduce “weak learner” to provide its prediction results to the question queries as a reference opinion.
- Proposed method improves the diagnostic F1 score by up to 0.27 on the MIMIC-CXR-JPG and Chexpert dataset.
Summary (Generated by Microsoft Copilot):
Here is a summary of the page:
-
Introduction: The paper discusses the application of Large Vision-Language Models (LVLMs) in medical Visual Question Answering (VQA) tasks, focusing on diagnosing pathologies from medical images.
-
Challenges: LVLMs suffer from hallucination problems and struggle with minority pathologies due to imbalanced training data.
-
Methods: Two prompting strategies are proposed: providing detailed explanations of pathologies and using a weak learner model to improve VQA performance.
-
Novelties: The study introduces cost-effective prompting strategies to reduce hallucination and improve diagnostic accuracy.
-
Results: The proposed methods significantly improve the diagnostic F1 score, with the highest increase being 0.27.
-
Performances: The strategies enhance Recall by approximately 0.07 and reduce false negative predictions.
-
Limitations: The strategies are less effective for pathologies with extremely scarce data.
-
Discussion: Future research could explore strategies like Retrieval Augmented Generation (RAG) to handle rare categories.