8 months ago

Abstract

This paper introduces the problem of multiple object forecasting (MOF), inwhich the goal is to predict future bounding boxes of tracked objects. Incontrast to existing works on object trajectory forecasting which primarilyconsider the problem from a birds-eye perspective, we formulate the problemfrom an object-level perspective and call for the prediction of full objectbounding boxes, rather than trajectories alone. Towards solving this task, weintroduce the Citywalks dataset, which consists of over 200k high-resolutionvideo frames. Citywalks comprises of footage recorded in 21 cities from 10European countries in a variety of weather conditions and over 3.5k uniquepedestrian trajectories. For evaluation, we adapt existing trajectoryforecasting methods for MOF and confirm cross-dataset generalizability on theMOT-17 dataset without fine-tuning. Finally, we present STED, a novelencoder-decoder architecture for MOF. STED combines visual and temporalfeatures to model both object-motion and ego-motion, and outperforms existingapproaches for MOF. Code & dataset link:https://github.com/olly-styles/Multiple-Object-Forecasting

Source PDF View Code