Toward More Reliable Multimodal Systems: Mitigating Hallucinations in Large Vision-Language Models

June 2025

Toward More Reliable Multimodal Systems: Mitigating Hallucinations in Large Vision-Language Models

Authors:

Zifu Wan

Abstract:

Recent advances in Large Vision-Language Models (LVLMs) have led to impressive performance across a wide range of multimodal tasks. However, their tendency to produce hallucinated responses—text that is inconsistent with the visual input—poses a significant challenge to their reliability and real-world applicability. In this thesis, we investigate two training-free approaches for mitigating hallucinations during the decoding process. First, we propose Self-Correcting Decoding with Generative Feedback (DeGF), which leverages the inverse nature of text-to-image generation to detect and correct hallucinations. By synthesizing an auxiliary image from the model’s initial textual response, DeGF provides visual self-feedback to verify and revise hallucinated outputs via contrastive or complementary decoding. Second, we introduce ONLY, a highly efficient decoding method that requires only a single query and a lightweight one-layer intervention. By selectively amplifying textual signal based on a text-to-visual entropy ratio, ONLY improves response reliability while maintaining real-time efficiency with minimal computational overhead. Extensive experiments across multiple hallucination benchmarks demonstrate that both DeGF and ONLY significantly outperform existing methods, offering practical and effective solutions for enhancing the trustworthiness of LVLMs in real-world applications.

Notes:

@mastersthesis{Wan-2025-147099,
author = {Zifu Wan},
title = {Toward More Reliable Multimodal Systems: Mitigating Hallucinations in Large Vision-Language Models},
year = {2025},
month = {June},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-25-47},
keywords = {Large Vision Language Models, Hallucination Mitigation},
}
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.