DwNet: Dense warp-based network for pose-guided human video generation

Polina Zablotskaia (University of British Columbia), Aliaksandr Siarohin (University of Trento), Leonid Sigal (University of British Columbia), Bo Zhao (University of British Columbia)

Abstract
Generation of realistic high-resolution videos of human subjects is a challenging and important task in computer vision. In this paper, we focus on human motion transfer - generation of a video depicting a particular subject, observed in a single image, performing a series of motions exemplified in an auxiliary (driving) video. Our GAN-based architecture DwNet leverages dense intermediate pose-guided representation and refinement process to warp the required subject appearance, in the form of the texture, from a source image into a desired pose. Temporal consistency is maintained by further conditioning the decoding process within a GAN on the previously generated frame. In this way a video is generated in an iterative and recurrent fashion. We illustrate the efficacy of our approach by showing state-of-the-art quantitative and qualitative performance on two benchmark datasets: TaiChi and Fashion Modeling. The latter is collected by us and will be made publicly available to the community

DOI
10.5244/C.33.205
https://dx.doi.org/10.5244/C.33.205

Files
Paper (PDF)
Supplementary material (ZIP)

BibTeX
@inproceedings{BMVC2019,
title={DwNet: Dense warp-based network for pose-guided human video generation},
author={Polina Zablotskaia and Aliaksandr Siarohin and Leonid Sigal and Bo Zhao},
year={2019},
month={September},
pages={205.1--205.13},
articleno={205},
numpages={13},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Kirill Sidorov and Yulia Hicks},
doi={10.5244/C.33.205},
url={https://dx.doi.org/10.5244/C.33.205}
}