Ovi is an audio-video generation model developed by Character AI (in collaboration with Yale University) that produces short, cinematic video clips with built-in synchronized audio from text prompts
Simultaneously creates realistic speech, sound effects, ambient audio and lip-synced visuals through its twin-backbone fusion architecture.
Supports multi-person dialogues, audio descriptions tags (<AUDCAP>…<ENDAUDCAP>) and speech tags (<S>…<E>).
Use cases for Ovi span from creative social media clips, narrative vignettes, short dialogues, to stylised advertising or promotional videos. Since the output includes audio and visuals, it gives you a more complete “mini-scene” experience in just a few seconds.