Introduction

CCIX is an inter-chip interconnect technology that enables two or more devices to share data in a cache-coherent manner. CCIX aims to simplify the architectural design of heterogeneous systems, while improving system bandwidth and reducing latency based on processors with different instruction sets (ISAs) or application-specific accelerators. To this end, a number of companies have jointly established a new industry standards body, the CCIX Alliance, to promote the application of CCIX technology. Today, the CCIX Alliance has gradually grown.

For chip interconnection networks, two metrics are critical: bandwidth and latency. CCIX uses two mechanisms to improve performance and reduce latency. The first mechanism is to use cache coherence to automatically keep the caches of the processor and accelerator coherent, improving ease of use and reducing latency. The second mechanism is to increase the raw bandwidth of the CCIX link. The highest connection rate is raised to 25GT/s (Gigabit Transfers/sec). At the same time, the CCIX specification also stipulates that multiple CCIX ports can use the Port AggregaTIon technology to provide performance that exceeds that of a single interface, matching the accelerator and memory expansion bandwidth. The CCIX architecture adopts a layered architecture based on the extension of the PCIe basic architecture. The CCIX protocol specification includes the CCIX protocol layer and the CCIX link layer. These layers specify the protocol for cache coherence, messaging, flow control, and the transport portion of CCIX. The CCIX transport specification includes the CCIX and PCIe transaction layers, the PCIe data link layer, and the CCIX physical layer. These layers are responsible for the physical connection between devices, including rate and bandwidth negotiation, transmission packet error detection and retry, and initial packet encoding protocols.

poYBAGLY8nSAdCofAADYh-thMgw863.png

The CCIX Protocol Layer is responsible for the consistency protocol, including memory read/write. This layer provides a simple mapping of On Chip coherence protocols such as AMBA CHI. The cache state defined at this layer enables the hardware to determine the state of the memory. For example, the hardware can determine whether the data is unique and unmodified (consistent with memory), or shared and modified (inconsistent with memory).

The CCIX Link Layer is responsible for the transmission format of messages between agents defined by the CCIX protocol layer. At present, the CCIX link layer is built on PCIe, but based on the layered architecture, CCIX can be mapped to different transport layers in the future. In addition, this layer is responsible for port aggregation (Port AggregaTIon), which enables multiple ports to be aggregated together to increase bandwidth.

CCIX and PCIe Transaction Layer, responsible for processing their respective packets. The PCIe protocol supports the deployment of virtual channels so that different data streams can pass through a PCIe link. By splitting the CCIX and PCIe transport streams into a virtual channel, CCIX and PCIe transports can share the same link. CCIX can transmit standard PCIe packets, or optimized CCIX packets (with several unnecessary fields in PCIe packets removed). When transmitting standard PCIe packets, existing PCIe switches can be used. The transmission of optimized CCIX packets can reduce the additional overhead of PCIe, making the packets transmitted consistently smaller and more efficient.

The PCIe data link layer, which performs all the normal functions of the data link layer. These functions include CRC error checking, packet acknowledgment and timeout checking, and credit initialization and exchange.

The basis of the CCIX/PCIe physical layer is the PCIe physical layer. CCIX extends the physical layer to support 25GT/S (giga transfers per second). This faster rate is called Extended Speed ​​Mode (ESM).

After a simple analysis of the CCIX layered structure, let's look at the topological structure of CCIX. CCIX can support a variety of flexible topologies, as shown in the figure below.

df9bb8f8-08a9-11ed-ba43-dac502259ad0.png

All CCIX devices have at least one CCIX port. A CCIX port can be associated with a set of physical pins for connecting to another CCIX port to exchange information between two or more different chips.

pYYBAGLY8lGAHORgAAM3UUDFnaY031.png

Agent types defined by CCIX include: Request Agent (RA), Master Agent (HA), Slave Agent (SA), and Error Agent (EA). These agents, along with the ports and links in the system, are collectively called CCIX components. A proxy is identified in the protocol by a proxy ID.

Request Agent: A request agent reads and writes to different addresses in the system. A requesting proxy MAY cache data for addresses it has accessed. Each CCIX request broker can have one or more processing units (Processing Element) as the initiator of the internal request, and the request is executed by a CCIX architecture request broker. Basically the CCIX Request Broker provides an interface for accelerators or CCIX-enabled IO masters to coherent system memory.

Home Agent: The home agent is responsible for managing the data consistency of the specified address. When the state of a cache line needs to change, the master agent maintains consistency by issuing a listen operation to the desired requesting agent.

Slave Agent: CCIX supports expanding system memory to include memory attached to peripherals. This situation occurs when a master agent is on one chip and some or all of the physical memory associated with the master agent is on another chip. This architectural component (extended memory) is called a slave agent. The slave proxy will not be accessed directly by the requesting proxy. Requesting proxies always access a master proxy, which in turn accesses slave proxies.

Error Agent: An error agent receives and handles protocol error messages. Protocol error messages are issued by CCIX components.

A key benefit of CCIX is its ability to share data between the host and accelerators using driverless data movement. Traditional PCIe accelerators need drivers to write and read data to and from the accelerator, which increases latency and computational overhead. With driverless data movement, CCIX can also extend system memory beyond that of the host device. Based on CCIX, each CCIX-enabled device behaves like a node in existing NUMA operating systems. This memory-based approach leverages existing operating system capabilities. In this mode, all data structures used for sharing are placed in shared memory accessible to both the processor and the accelerator. This data-sharing model eliminates accelerator-specific control and management drivers, allowing accelerator resources to be invoked by long-running tasks scheduled by a central scheduler. This scheduler can be part of the operating system scheduler, or it can cooperate with the operating system scheduler. The above is a brief introduction to the CCIX specification. Later, we will start to analyze the CCIX specification step by step. 【to be continued】

Editor: Huang Fei

Leave a Reply

Your email address will not be published. Required fields are marked *