SAM 2: Segment Anything in Images and Videos

Table of Contents

Title: SAM 2: Segment Anything in Images and Videos

Authors: Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollár, Christoph Feichtenhofer

Published: Aug 1 2024

Link: https://arxiv.org/abs/2408.00714

Summary (Generated by Microsoft Copilot):

Introduction:

SAM 2（Segment Anything Model 2）は画像と動画におけるvisual segmentationのためのfoundation modelであり、promptable segmentationタスクを処理するように設計されている。

Challenges:

動画segmentationは動き、変形、occlusion、照明変化、多数のframeの効率的な処理など、独自の課題に直面している。

Methods:

SAM 2はリアルタイム動画処理のためのstreaming memoryを備えたtransformerアーキテクチャと、大規模な動画segmentationデータセットを収集するためのdata engineを使用している。

Novelties:

SAM 2は前身のSAMより6倍高速で精度が高く、動画segmentationに必要なinteraction数は3分の1である。

Results:

SAM 2はより優れたsegmentation精度を達成し、動画と画像のsegmentation benchmarkの両方で従来のmodelを上回る性能を示している。

Performances:

SAM 2はzero-shot動画・画像segmentationを含む様々なタスクで優れた性能を示し、人口統計グループ間の性能差は最小限である。

Limitations:

SAM 2はshot変化を跨ぐobjectのsegmentation、混雑したscene、長いocclusion、細かいdetailを持つ高速移動objectのsegmentationに苦労している。

Discussion:

SAM 2は動画segmentationにおける重要なmilestoneであり、速度、精度、interactive experienceの改善を提供している。今後の研究ではmotion modelingとinter-object communicationの強化に焦点を当てることができる。