Compared to traditional ASICs, the advantages of FPGAs and structured ASICs are high reuse flexibility, fast time-to-market, good performance, and low cost. FPGAs and dedicated IP blocks can be used on existing commercial AdvancedTCA platforms to develop scalable switching interface controllers (FICs) to speed design for product development and make line card solutions robust and cost-effective.
Today’s communications and computing system manufacturers are designing next-generation platforms based on modular system architectures to shorten development cycles, reduce capital expenditures for new equipment, and minimize operational expenses when adding new features and services. Modular platforms enable equipment manufacturers to design multiple types of systems on a common set of building blocks, thereby remaining competitive by achieving economies of scale.
Figure 1: Functional diagram of an SPI4.2 to ASI switch interface controller. On the left is the interface from SPI4.2 to the NPU, and on the right is the connection from ASI to the switch fabric.
The definition of an industry standard set of AdvancedTCA backplane interfaces allows system integrators greater flexibility and interoperability in interconnecting their switch interface cards and line cards. The AdvancedTCA network interface adopts an open interface protocol and provides interoperable circuit boards using sub-specifications PICMG 3.1-3.5. These subspecs support Ethernet, Fibre Channel, Infiniband, PCI Express, StarFabric, Advanced Switch Interconnect (ASI), and Serial RapidIO. The move to the AdvancedTCA specification by some large OEMs marks a shift from custom, proprietary and interconnect-based platforms to open standards-based COTS platforms.
PCI Express and ASI
System scalability and modularity require common interconnects to support seamless integration of chips and/or subsystems in multiple applications. As backplane performance increases from 40Gbps to 160 or even 320Gbps, careful design must be done to ensure that the interface between the switch fabric and the source of the data flow does not experience bottlenecks. Switched interfaces must efficiently transport data streams from 2.5 Gbps to over 10 Gbps with good signal integrity while supporting critical fabric requirements such as data throughput, flow control, and per-flow queuing.
Figure 2: TLP with ASI header, optional PI0 and PI1 headers, and a PI2 header.
PCI Express and ASI are two standard switching fabric technologies that have the potential to dramatically increase the market for standard, state-of-the-art switching equipment and switching interface devices. PCI Express has economies of scale for manufacturing, technical support and product development spanning ecosystems from computing to communications. The benefits of migrating PCI Express to serial interconnects are: physical and performance scalability; improved reliability; full-duplex transmission; simpler and less expensive cabling and cabling.
ASI enhances PCI Express by defining compatible extensions to address requirements such as support for peer-to-peer communications, QoS, multicast, and support for multiprotocol encapsulation. PCI Express and ASI are complementary protocols, and many systems use both to meet design requirements that are not currently possible. As new framers, network processing units (NPUs), and switch fabrics adopt ASI, it is necessary to bridge ASI with other interface specifications, such as SPI3, SPI4.2, and CSIX. This bridging function can be easily integrated with the switch interface controller.
The functions of an SPI4.2 to ASI controller (Figure 1) include:
1. Bidirectional bridge from ASI to SPI4.2, expandable from 2.5Gbps to 20Gbps (x1, x4 or x8);
2. Assemble and disassemble ASI Transaction Layer Packets (TLPs) for endpoints and bridges;
3. Support 1 to 64,000 connection queues (CQ);
4. Support up to 16 channels on SPI4.2;
5. The programmable channel is mapped to SPI4.2;
6. Supports one bypassable, three ordered and one multicast virtual channel (VC);
7. Programmable maximum packet length is 64 to 80 bytes;
8. Credit-based flow control at the link layer;
Figure 3: Example of a PI2 package. The original SPI4.2 burst data stream is converted into the ASI TLP by removing the SPI4.2 Protocol Control Word (PCW) and adding the ASI header, optional PI0 and PI1 headers, and PI2 header.
9. CRC generation and error checking;
10. Handle consecutive back-to-back end-of-packet (EOP);
11. DIP4 parity generation and check;
12. Status channel framing, DIP2 generation and verification;
13. State synchronization generation loss and detection;
14. Training sequence generation and detection;
15. Full synchronization design (800Mbps);
16. SPI4 Phase 2 compatible with OIF;
17. Compatible with ASI-SIG, ASI Core Architecture Specification Revision 1.0.
In the SPI4.2 to ASI direction, incoming SPI4.2 packets are segmented as necessary and mapped to VC FIFO buffers based on traffic type (unicast or multicast) and class. The user programs the channel mapping information buffered to the SPI4.2 interface in the SPI4 to VC mapping table, and the data packets on the interface are transmitted to the corresponding buffer as shown in the table. The ASI scheduler reads the queue and sends the TLP to the switch fabric.
The fill level of each SPI4.2 channel FIFO buffer is converted to an “empty-not-full-full” state and sent to the peer SPI4.2 transmitter via the receive status channel (RSTAT). Packets received on the SPI4.2 interface are transferred to the corresponding VC FIFO buffers when there is room.
SPI4.2 and each VC supports up to 16 channels (channels 0 to 15). Below is an example channel assignment from SPI4.2 to VC:
1. SPI4.2 channels 0 to 7 are mapped as 8 bypassable virtual channels (BVC);
2. SPI4.2 channels 8 to 11 are mapped as 4 ordered virtual channels (OVC);
3. SPI4.2 channels 12 to 15 are mapped as 4 Multicast Virtual Channels (MVC).
ASI to SPI4.2 output packet stream
In the ASI to SPI4.2 direction, using a programmable address mapping table (Figure 2), the ASI TLP and traffic class output from the switch fabric of a given VC is mapped to one of 16 SPI4.2 channels. The user programs the channel mapping information of the VC to SPI4.2 interface in the VC to SPI4 table. The data multiplexing (MUX) log table RAM (VCS4 log table RAM) contains the schedule for reading data from the VC interface FIFO buffer to transferring the data to the SPI4.2 interface. The VCS4 record table RAM has 16 locations.
The VCS4 data MUX and address mapping module reads data from the VC FIFO channel according to the order specified by the VCS4 record table RAM. The SPI4.2 source module splits the queue and reassembles packets when necessary, adds SPI4.2 payload control operations, and sends them to the NPU through the SPI4.2 interface. The SPI4.2 source module also performs credit management and schedules based on flow control information received from the peer SPI4.2 sink.
ASI provides several Protocol Interfaces (PIs) that provide optional functionality or adapt various protocols to the ASI infrastructure.
Protocol interface description
PI0 encapsulation is used for multicast routing. The second PI that is 0 indicates spanning tree data packets, the second PI that is not 0 indicates multicast routing, and multicast group addressing is implemented through the multicast group index field.
PI1 passes the connection queue identification information to the downstream peer switching unit or endpoint. When congestion occurs, the downstream peer switching unit may send a PI5 congestion management message identifying the upstream peer switching unit’s offending connection queue.
PI2 provides segmentation and reassembly (SAR) services and encapsulation. The PI2 header contains Start of Packet (SOP) and End of Packet (EOP) information that facilitates packet description. Additionally, the PI2 encapsulation specifies optional Preamble Block (PPD) and End Block (EPD) bytes that can line up payload data within the PI2 container.
If the SPI4.2 burst packet length is equal to the ASI TLP payload length (Figure 3), PI2 encapsulation can be used to describe the packet and map the data flow to the context. At this point, the received SPI4.2 burst data has been segmented into the payload length supported by the ASI interface. Therefore, from a packet description point of view, PI2 only needs to represent SOP and EOP.
For intermediate bursts of data, the PI2 SAR code is “centered”. Note that since non-EOP SPI4.2 burst data must be multiple 16 bytes, the intermediate packet SPI4.2 payload will always be a 32-bit arrangement, matching the ASI payload.
Figure 4: In the example of PI2 segmentation, the SPI4.2 packet is divided into three ASI TLPs, the SPI4.2 protocol control word is removed, and for each TLP, the ASI header is added with optional PI0 and PI1 headers and PI2 header.
For terminal burst data, if all bytes in the last TLP word are valid or terminal with end pad, the PI2 SAR code is “terminal” to indicate valid bytes in the last word Number of.
PI2 SAR is used to fragment and reassemble SPI4.2 packets if the SPI4.2 burst packet length exceeds the ASI TLP payload length. Received SPI4.2 burst packets are segmented in the bridge to the length of the payload supported by the ASI interface (Figure 4).
As for encapsulation, the PI2 SAR codes for the three TLPs are set to represent “initial”, “intermediate”, and “termination” or “final block termination”, respectively. For reassembly, AS fragments from each association domain are reassembled into complete packets. Once the complete packet is obtained, it is mapped to an SPI4.2 channel and output in burst packets. Burst packets from different channels of SPI4.2 can be interleaved.
Map traffic type, class and destination port
The switch interface must transmit several important properties along with the data. These attributes include traffic type (unicast or multicast), class, destination port, and congestion management. These parameters are all supported in AS. However, in SPI4.2, this information is mapped in the SPI4.2 channel number or in a proprietary header within the SPI4.2 payload.
SPI4.2 utilizes three levels of congestion indication (empty, under-full, full) for credit-based flow control. The transmitter refills the credits by presetting the maximum burst data (Maxburst1 and Maxburst2) corresponding to the empty and underfilled states.
Figure 5: Dual network processors and full-duplex line cards with dedicated FICs in a typical single 10Gbps port.
ASI has several flow control options: VC, which is a credit-based flow control; token buckets for source rate control; state-based flow control by class or flow queue.
Congestion management within bridges is an integral part of bridge architecture and buffering mechanisms. Bridging can use two basic architectures, either flow-through with little or no buffering, or single or two levels of buffering per interface.
In a pass-through architecture, flow control information is generated and applied externally to the bridge. This approach simplifies the design of the bridge, however, increases the delay time between the source and flow-controlled destination ports, and therefore may require additional buffering resources.
In a buffered architecture, the bridge itself obeys flow control information, thus requiring internal buffering. The internal bridge buffer can be shared by both interfaces (single stage), or each interface can be equipped with its own associated buffer, called two-stage buffering.
The ingress network processor receive port is configured as SPI4 for the physical device interface, while the transmit port is configured as SPI4.2 for the switch interface, connected to the proprietary FIC (Figure 5). The FIC supports a full-duplex SPI4.2 interface and up to 24 full-duplex PCI Express SERDES (serialization/deserialization) links at a rate of 2.5Gbps, and a 10Gbps full-duplex link port requires 4 SERDES link. Unused SERDES links can be powered down through device configuration register settings. In this 10Gbps example, the NPU configures the “Configuration and Status” registers inside the EP1SGX40 via the PCI local bus interface.
Proprietary FIC Reference Design
The proprietary FIC reference design platform is designed and verified using Intel’s IXDP2401 advanced development platform. The AdvancedTCA rack interconnects two IXMB2401 network processor carrier cards connected to the AdvancedTCA high-speed switching interface. The carrier card is a PICMG3.x compatible board designed with an IXP2400 processor. The carrier card adopts a standard component structure and contains 4 daughter card slots and an optional switch interface daughter card slot for connection to the switch interface pins in Area 2 of the AdvancedTCA backplane.
A proprietary, FPGA-based switch interface mezzanine card slot is designed to accept a carrier card and provides a reconfigurable FIC and optional traffic management development board. The FIC interconnects the processor with the AdvancedTCA switch fabric. Utilizing reprogrammable devices containing PCI Express- and XAUI-compatible multi-channel transceivers provides a scalable development platform for rapid design and verification of AdvancedTCA FIC designs from 2.5Gbps to 10Gbps (Figure 6).
The main operating mode of the reference design receives 32-bit SPI3 or 16-bit SPI4.2 data from the entry port of the processor, transmits the data stream to the AdvancedTCA backplane through the FPGA integrated transceiver, and passes the backplane data stream through the 32-bit SPI3 or 16-bit SPI3 or 16-bit The SPI4.2 interface is sent back to the exit port of the processor.
The integrated transceiver is configured via the processor’s SlowPort outlet. The reference design supports several other modes of operation, including SPI4.2 interface loopback, ASI interface loopback, traffic management, switch fabric packet generation and monitoring.
FPGA and structured ASIC FIC
Using proprietary multi-FPGA and structured ASIC technology, scalable PCI Express, ASI bridges and endpoints can be developed. Built-in high-density, high-performance FPGA compatible with PCI Express transceivers can provide: 1. A total solution with scalable 2.5 links; 2. Dynamic phase correction for interfaces running at up to 1Gbps per channel ( DPA); 3. Multiple package options and density options up to 40,000 logic cells.
Figure 6: Functional block diagram.
An optional FPGA combined with a stand-alone PCI Express-compatible SERDES, such as PMC-Sierra’s PM8358 QuadPHY 10GX device, can be used in applications where cost is more important than performance and extended functionality needs, providing low-cost 1x, 2x, and 4x (lane) ) flexible solutions. The combination of a high-density, high-performance FPGA with a stand-alone, PCI Express-compatible SERDES can be ported to a dedicated structured ASIC to provide the highest density, fastest performance, and greatest number of applications required.
Responsible editor: gt