[논문리뷰] - VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time - 마이크로소프트가 내놓은 괴물
Abstract We introduce VASA, a framework for generating lifelike talking faces with appealing visual affective skills (VAS) given a single static image and a speech audio clip. Our premiere model, VASA-1, is capable of not only producing lip movements that are exquisitely synchronized with the audio, but also capturing a large spectrum of facial nuances and natural head motions that contribute to..