Yin and Yang: Balancing and Answering Binary Visual Questions

Table of Contents

Title: Yin and Yang: Balancing and Answering Binary Visual Questions

Authors: Peng Zhang, Yash Goyal, Douglas Summers-Stay, Dhruv Batra, Devi Parikh

Published: Nov 16 2015

Summary (Generated by Microsoft Copilot):

Introduction:

The paper addresses binary Visual Question Answering (VQA) on abstract scenes, focusing on visual verification of concepts inquired in questions.

Challenges:

Language priors can lead to superficial performance without true visual understanding.
Dataset biases can hinder progress in multi-modal AI.

Methods:

Convert questions into tuples summarizing visual concepts.
Use abstract scenes to balance the dataset, ensuring equal “yes” and “no” answers for each question.

Novelties:

Results:

Language-only models perform poorly on the balanced dataset.
Proposed approach matches state-of-the-art performance on unbalanced datasets and outperforms on balanced datasets.

Performances:

Limitations:

Discussion: