By 2022, the number of Internet video users in China will reach 900 million; IDC (China Internet Data Center) predicts that in 2025, 80.3% of the global data space will be unstructured data represented by video, pictures, and audio, and audio Efficient processing of video is increasingly important.

The rise of short videos and live broadcasts has opened up new formats, and applications such as live video broadcasts, intelligent content generation, and video recommendation have become more and more abundant, which has brought huge development opportunities to the video cloud business, and also proposed cloud service architecture. New challenges involve multiple technologies such as high concurrency, distributed storage, audio and video encoding and decoding, and edge computing.

Young architects are often full of doubts about architecture design and technology selection. The second season series of "Architect Growth Plan" jointly launched by Science and Intel is tailored for such needs. Ten hot topics, such as games, 5G core network, computing power network, federated learning, and Shengxin big data, provide architects with high-quality learning resources and practical experience sharing.

In order to help architecture technicians in the audio and video industry to answer questions, the first phase of the "Architect Growth Plan" specially invited Tan Dai, General Manager of Volcano Engine, Cheng Congchao, Senior Chief Engineer of Intel Big Data, and Liu Jiang, Deputy Dean of Zhiyuan Artificial Intelligence Research Institute. Discuss the "Construction and Evolution of Audio and Video Architecture in the Super Video Era".

Volcano Engine: Audio and Video Architecture in the Super Video Era      

First lecturerTandaiCombined with the specific practice of the volcano engine, aiming at the technical direction of video cloud that has attracted much attention: the application of edge computing, audio and video encoding and decoding, intelligent middle station and other technologies, the evolution of audio and video architecture in the super video era is introduced. The course is divided into three parts:

Part 1: What is the Hyper-Video Era

In 2020, with the outbreak of the epidemic, all walks of life have begun to contact video actively or passively. Video cloud has penetrated into more "traditional" industries, and more landing scenarios have emerged in industries such as industry, education, and medical care. The architecture brings different challenges in functions, performance and security, and the video cloud has come to the super video era.

Part II: Architecture Design of ByteDance Audio and Video Business

The edge terminal is closest to the business scenario. First, the abundant edge resources and operator networks in various provinces and cities across the country are selected, and high-quality single-line, multi-line and BGP nodes are deployed according to the geographical level, combined with hardware devices of various architectures, such as: X86, ARM Servers, smart network cards, GPUs, etc., realize the edge base base for heterogeneous computing power, and build wide area network access and edge data processing capabilities from 1ms to 40ms. Based on the edge infrastructure base, a cloud-native edge platform is built, which flexibly manages heterogeneous computing and network resources and realizes one-network scheduling at the edge.

In order to reduce the challenge of ultra-large-scale and real-time processing to the central architecture, the corresponding computing and storage architecture is constructed through the ROI-based video coding concept, which not only achieves the benefits of bandwidth costs, but also has significant user indicators, including average duration, etc. promote. Through self-developed multi-scene data sets and eye trackers to collect ROI, and then use mobilenet on CPU to accelerate time-domain modeling, and parallel processing on GPU, the accuracy rate of > 90% is achieved, and the video compression performance is also significantly improved.

Through data-driven optimization experience, a complete QoS and QoE data system is established, and continuous optimization is carried out to evolve with data-driven technology. From data collection, mining, model training to strategy issuance, to the feedback from the serial AB experimental platform, the optimization can be personalized, refined and cost-effective for different users and different scenarios.

Part 3: The future evolution trend of video cloud and what kind of video technology is needed

Facing the era of super video, video cloud technology should better satisfy users' immersive, interactive, and high-definition video experience. In combination with the large-scale 8K ultra-high-definition broadcast at this year's Beijing Winter Olympics, the lecturer pointed out that video encoding and transmission are always a huge challenge, requiring top video compression capabilities. The H.266 video encoding solution of the volcano engine's integrated device and cloud saves 30%. -50% bitrate, paving the way for Ultra HD video.

Intel shares:

Software and hardware integrated end-to-end video optimization

Intel Big Data Senior Principal EngineerCheng Congchao, in the first course, we brought you "an end-to-end video optimization solution integrating software and hardware".

Focusing on multiple links from input to output, from software to hardware, and from content production, storage, computing and distribution, the full-stack optimization solution of Intel Video Cloud is interpreted.

In terms of video encoding and decoding, Intel has developed a CPU-based encoding and decoding component, Scalable Video Technology (SVT). SVT technology realizes block parallelism of a picture, parallelism between pictures and pictures, and multiple frames of video. The parallelism between the CPUs makes full use of all the cores of the entire CPU through several layers of parallelism. In each core, through the optimization of the AVX-512 and SIMD instruction sets, more processing is performed in one CPU instruction cycle. , and finally achieve 2-20 times the video encoding and decoding performance improvement.

SVT has a highly scalable core architecture, fully realizes the optimization of the SIMD/AVX-512 instruction set, realizes better thread and process concurrency based on the Intel Xeon CPU platform, and makes full use of multi-core functions to achieve video transcoding speed, video quality, and transmission. The best trade-off for speed.

After software layer optimization like SVT and the interface of the underlying infrastructure XPU are encapsulated, according to different loads, different underlying processing units can be called through OneAPI. Where the data is executed, it can be automatically sensed and scheduled to make full use of the cloud. , edge, and terminal processing capabilities to maximize the efficiency of encoding, decoding, reasoning, and rendering.

Big coffee conversation:

How to Balance Software and Hardware Investment in Video Cloud Architecture

Communication and collision sparks, communication inspires inspiration. In this roundtable dialogue, Liu Jiang, Vice President of Zhiyuan Artificial Intelligence Research Institute, Tan Dai, lecturer, and Cheng Congchao, lecturer, conducted in-depth discussions and discussions on the "construction and evolution of audio and video architecture in the super-video era".

Liu Jiang:In the era of live broadcast and short video, how does Volcano Engine improve application experience through technologies such as AI and cloud?

Tan waits:Integrate abundant edge nodes and network resources around the world, whether it is traditional audio and video applications or new edge computing scenarios, can quickly get responses, specific to audio and video, through efficient encoding and decoding technology to achieve a balance between performance and experience ; and established a set of indicators system based on QoS and QoE to continuously improve user experience with data-driven.

Liu Jiang:When the computing power of the processor reaches the bottleneck, how to improve the processing efficiency of audio and video? What solutions does Intel have in audio and video?

Cheng Congchao:Intel has done a lot of "software and hardware integration" industry solutions. In a nutshell, it can be summarized into three major blocks – hardware, speed increase and cost reduction. The future cloud computing power must be an XPU (multiple processing unit) solution. CPU, GPU, and IPU are combined together to achieve more flexible computing power. Distributed computing power; DPDK and SPDK have greatly optimized network transmission; in terms of software, Intel has been committed to contributing to the open source community, we do upstream/downstream, and try to enable the open source community. At the industry level, work with partners to make industry solutions.

As an important cloud computing power, what new breakthroughs have XPU made in deep learning reasoning? How to build the architecture of video recommendation technology and how to support the understanding and distribution of billion-level video content? The iteration cycle of hardware is longer than that of software. As an architect, how should an architect balance the investment of hardware and software resources to achieve the highest cost performance?

Editor: Huang Fei

Leave a Reply

Your email address will not be published.