The AI-generated video model Ying, independently developed by Beijing Zhipu Linghang Technology, a company in Beijing Economic-Technological Development Area (Beijing E-Town), was launched recently.
The video model, developed in the high-performance computing cluster of Beijing E-Town, can quickly generate a high-precision 6-second video in just 30 seconds, saving significant amounts of time and money in video production.
The base video generation model for Ying is CogVideoX, which integrates text, time and space dimensions, referencing the algorithm design of Sora (a large AI text-to-video model). Through optimization, CogVideoX has improved its inference speed by a margin of 6 compared to its predecessor (CogVideo). In subsequent versions, Zhipu will gradually introduce higher resolution and longer duration video generation capabilities.
"We are actively exploring more efficient scaling methods at the model level," said Zhang Peng, CEO of Zhipu AI - the controlling shareholder of the Beijing Zhipu Linghang Technology.
"With continuous iterations of algorithms and data, I believe the Scaling Law will continue to play a powerful role."
Ying API has also been synchronized with the launch of the large model open platform bigmodel.cn. Enterprises and developers can experience and utilize the model capabilities of text-to-video and image-to-video by using the API.
The new DiT model architecture efficiently compresses video information and better integrates text and video content, allowing Ying to stand out in complex instruction compliance, content coherence and significant scene scheduling.