Yin and Yang: Balancing and Answering Binary Visual Questions
Table of Contents
Title: Yin and Yang: Balancing and Answering Binary Visual Questions
Authors: Peng Zhang, Yash Goyal, Douglas Summers-Stay, Dhruv Batra, Devi Parikh
Published: Nov 16 2015
Link: https://arxiv.org/abs/1511.05099
Summary (Generated by Microsoft Copilot):
Introduction:
- The paper addresses binary Visual Question Answering (VQA) on abstract scenes, focusing on visual verification of concepts inquired in questions.
Challenges:
- Language priors can lead to superficial performance without true visual understanding.
- Dataset biases can hinder progress in multi-modal AI.
Methods:
- Convert questions into tuples summarizing visual concepts.
- Use abstract scenes to balance the dataset, ensuring equal “yes” and “no” answers for each question.
Novelties:
- Balanced dataset creation with complementary scenes.
- Tuple extraction for concise visual concept representation.
Results:
- Language-only models perform poorly on the balanced dataset.
- Proposed approach matches state-of-the-art performance on unbalanced datasets and outperforms on balanced datasets.
Performances:
- Significant improvement in visual reasoning and understanding.
- Better performance by attending to relevant image regions.
Limitations:
- Some scenes cannot be modified due to limited clipart library.
- Handling negative questions remains challenging.
Discussion:
- Balancing datasets can improve visual understanding.
- Future work should focus on detailed visual semantics and real images.