8 months ago

Abstract

Interactive video object segmentation (iVOS) aims at efficiently harvestinghigh-quality segmentation masks of the target object in a video with userinteractions. Most previous state-of-the-arts tackle the iVOS with twoindependent networks for conducting user interaction and temporal propagation,respectively, leading to inefficiencies during the inference stage. In thiswork, we propose a unified framework, named Memory Aggregation Networks(MA-Net), to address the challenging iVOS in a more efficient way. Our MA-Netintegrates the interaction and the propagation operations into a singlenetwork, which significantly promotes the efficiency of iVOS in the scheme ofmulti-round interactions. More importantly, we propose a simple yet effectivememory aggregation mechanism to record the informative knowledge from theprevious interaction rounds, improving the robustness in discoveringchallenging objects of interest greatly. We conduct extensive experiments onthe validation set of DAVIS Challenge 2018 benchmark. In particular, our MA-Netachieves the J@60 score of 76.1% without any bells and whistles, outperformingthe state-of-the-arts with more than 2.7%.

Source PDF