Recently, the AI anchor appeared in the open AI class of Jingdong digital technology industry, which is the first time that the AI virtual digital human product independently developed by Jingdong digital technology appeared in front of the public. It is reported that Jingdong digital technology integrates 2D and 3D technologies, combines traditional methods with in-depth learning, and can synthesize realistic AI virtual digital people after a few hours of training using only a few minutes of character video data.
It is reported that in order to successfully realize the transformation from text to speech, JD digital has developed a lightweight anti speech synthesis technology. By combining with the deep neural network to carry out personalized modeling on multi-person data, it can achieve the speech synthesis effect of rich timbre and distinctive features, and even simulate human breathing and pause. The listening feeling is like a real person. The efficient and lightweight countermeasure neural network greatly improves the synthesis speed. It only takes 0.07 seconds to synthesize 1-second audio, and the synthesis delay is only 1 / 3 of the industry level. It fully supports multi scene real-time speech synthesis.
In the stage of voice generated video, in order to make the image of AI anchor more realistic, Jingdong digital AI laboratory uses the confrontation generation network to restore a more real expression, and uses 3D model motion tracking technology to ensure that the AI anchor has accurate mouth shape, fine expression and natural head movement when speaking. In order to achieve perfect results in mouth matching, Jingdong digital AI laboratory uses a large number of voice data and specially designs robust voice features, so that it can be driven by synthetic speech with different timbre, language and speed, and can maintain accurate and coherent mouth shape.
Finally, Jingdong Digital Technology Co., Ltd. has developed an AI virtual digital human with highly restored human image. Driven by the AI algorithm, just input the text content, “Xiaoni” can host in real time according to the semantics, and the expression, action and voice expression are very natural and realistic.
Bo liefeng, chief scientist of AI Laboratory of JD digital, said: “the launch of AI anchor Xiaoni is a successful landing application of JD digital in the field of multimodal AI technology. In addition to its applications in customer service, recruitment and other fields, we also use AI virtual digital human technology to transform static graphic content into short videos explained by “real people”, so as to meet users’ needs for diversified presentation forms and enhance users’ stickiness and community activity. “