Stylize Your Video with Artistic Generation and Translation

Arxiv 2024

1Hong Kong University of Science and Technology 2Kuaishou Technology
Intern at KwaiVGI, Kuaishou Technology Corresponding Author


TL;DR: A high-quality video stylization method enables both stylized video generation and video translation

Comparison with Stylized Video Generation Methods


Video Style Transfer with Different Styles


Image Style Transfer Result


Stylized Video Generation with Different Prompts


Stylized Video Generation with Different Style Images


Method

We first obtain patch features and image embedding of the style image from CLIP, then we select the patches sharing less similarity with text prompt as texture guidance, and use a global projection module to transform it into global style descriptions. The global projection module is trained with a contrastive dataset constructed by model illusion through contrastive learning. The style information is then injected into the model through the decoupled cross-attention. The motion adapter and gray tile ControlNet are used to enhance dynamic quality and enable content control respectively.