Abstract: Video summarization is understanding video which aims to get an abstract view of the original video sequence by the concatenation of keyframes representing the highlights of the video. In this work, we propose an enhanced deep summarization network (EDSN) to summarize videos. We implement a reinforcement learning based framework to train our EDSN, where we design a novel reward function which considers the spatial and temporal features of the original video to be included in the summary. The reward func- tion is formulated using the spatial and temporal scores obtained for each frame of the video using the temporal segment networks. During training, the reward function seeks to generate a summary by including the frames with high temporal and spatial scores, while the EDSN strives for earning higher rewards by learning to produce more diverse summaries. The method is completely unsupervised since no labels are required dur- ing training. Extensive experiments on two benchmark datasets show that the proposed approach achieves state-of-the-art performance.
|Comments:||Presented at BMVC 2019: Workshop on Applications of Egocentric Vision (EgoApp), Cardiff, UK.|
|Cite as:||Paper (PDF): EgoApp2019_4.pdf|