Because the communication in 5g and IOT era has the characteristics of high speed, low delay and high bandwidth, in order to realize the connection between 5g and IOT, a large number of base stations need to be established, which will also bring a lot of power, signal, optical fiber and other connectors and cables. Therefore, many high-speed digital interfaces are gradually coming into our eyes. At present, the most common ones are PCIe, SAS and SATA, and of course, XA on 10gethernet UI protocol and thunderbolt, which are jointly developed by Intel and apple as a general high-speed interface that combines DisplayPort ports together. Today, we want to share how the highspeed digitalio itself is designed and implemented! You can ask Du Niang to explain the rank of PCIe, SAS and SATA, who will lead the storage interface

High speed digital is usually a serial interface. It can also be said that it is because it is serial that the speed becomes so high, because it changes the data on the parallel bus inside the chip into serial transmission. For example, the qsfp40g mentioned above is also achieved through 10g * 4x, but why do we have to do this? The Si problem of high-speed digital signal on PCB is very complex and difficult to design. Why can’t we directly send the parallel data inside the chip? Let’s get to know it together!

This also has to start with Moore’s law. When the chip size keeps increasing, there are not more pins on the chip package. Therefore, it is impossible to pull all the IO onto the package separately. As the friends familiar with packaging know, the more pins are added, the higher the package cost. In order to meet the cost performance ratio of the market, we must also achieve such a requirement, and the smaller the number of pins is for Si (PI) )In fact, it is also beneficial. Imagine that when dozens of parallel IO are flipped at the same time, it will bring a lot of SSO (simultaneous switching output), and the stability of power supply will be very poor, and the whole board will also have large EMI. When dozens of single ended parallel IO become a pair of serial signals, the problem of SSO will be greatly reduced. Generally, the high-speed serial interface is differential input and output Because the current of the differential signal itself is opposite, the current on the power supply is stable for the IC at the drive end! So now we understand that the high-speed interface is mainly to reduce the cost, but at the same time, one of the problems that the high-speed interface must solve is the board level synchronization problem. When the IO speed is very slow, the delay caused by board routing can be almost ignored, but when the IO speed is very fast, the delay caused by board routing will be several cycles shorter.


As shown in the figure, the high-speed IO has been flipped for several cycles, and the low-speed IO has only experienced a rising edge

So at this time, we can’t use the previous system level synchronization mode, because the whole board is synchronized with the same CLK, but the distance between the CLK and each IC is different, so it can’t be synchronized. Unless it is stable and completely symmetrical, and the routing is completely equal, then I think this kind of board should not be sold at all!

Another synchronization method is that the driver IC sends CLK and data to the receive side at the same time. If the clock and data are configured with the same length, it can ensure that the clock and data are aligned. For example, the common DDR is this way. However, the trouble with this method is that there is still a very strict requirement for the same length when the transmission speed is higher than Gbps, which also limits the flexibility of wiring, which requires DDR3 The training process is added in the chip of DDR4 and DDR4 to realize the read-write balance and leave more margin for our wiring. In addition, when the receiver receives the CLK from the driver, it has to match the CLK with the master clock of its own chip. Suppose a 32 bit DDR data, with a CLK every 8 bits, so that the receiver will receive four different clks, and then move them to the clock domain of the chip itself, so that there are five different clock domains in the chip. Chip design is also more troublesome.

No, none of them is good enough. A really good one should be concise and effective. Since it is not so flexible to send CLK and data at the same time, can we only send data and then extract the CLK information from the data at the receiving end? In this way, it is not necessary to synchronize the data with CLK, nor to move the clock domain of CLK!

The answer is yes. Every part of high-speed digital IO serves this purpose: to effectively analyze data and CLK, and then move them to the internal differential bus. The increased chip area of these modules is nothing compared with the cost of packaging dozens of pins. Do you think the inventors of high-speed serial interface are too abnormal to solve with money Rely on wisdom, let us ordinary people follow up, how hard!

In short, the structure of high-speed digital interface can be simplified as the following figure:


The transmitter must first have a parallel to serial structure, which can be realized by a series of D flip flops. The data on the parallel bus can be moved bit by bit to the serial D flip-flop with the parallel CLK as the enable signal. Then move the phase of the internal parallel CLK (for example, move 90 degrees) to create four clocks: 0 degree, 90 degree, 180 degree and 270 degree. With this clock, the clock becomes 4 times of the original. Use this clock to store the data in the D trigger before sending! The receiver is also in a similar way, but in reverse series to parallel. The clock recovery in the data is completed by PLL. In fact, not only the clock is recovered, but also the control signal needs to be separated from the data!

PLL is recovered according to the fastest frequency of signal received on rxpin, and then with Rx, it becomes internal RX data!


Now that we have data and clocks, it’s time for what we call protocols to work. In the protocol, we define some combinations of databits as flags. When receiving these flags, we can judge whether the next data is data, control or idle!

Why should idle be distinguished? Sending a string of zeros is not the end? Not really. In order to recover the CLK normally, try to ensure that the output data does not have long strings of 0 and 1. This involves the most common 8B / 10ben coding, 64b / 66bscrambling, etc., which are also defined in the agreement. For the association’s agreement, such information can be downloaded from the corresponding Association website

Leave a Reply

Your email address will not be published. Required fields are marked *