The best processing solution is often provided by the combination of RISC, CISC, graphics processor and FPGA, or by FPGA alone, or by FPGA with hardware processor core as part of the structure. However, many designers are not familiar with the functions of FPGA, its development and how to use FPGA. This five part series, Part 1, will discuss the basics of FPGAs and introduce some sample solutions from major vendors. Part 2, Part 3, Part 4, and Part 5 will focus on the design tools developed by lattice semiconductor, microchip, Altera, and Xilinx FPGA device families.
Designers have been looking for the construction method of system architecture to provide the best computing solution that can meet all application requirements. In many cases, this optimal solution often requires the use of field programmable gate arrays (FPGAs). Unfortunately, many designers are not familiar with the functions of these devices and how to integrate them.
This article will briefly describe the design scenarios that can benefit from using FPGAs. Then, after explaining the basic working principle, some interesting FPGA solutions and development kits will be introduced.
Why use an FPGA?
There are many computing applications, and the best method to meet the application requirements may vary from application to application, including off the shelf microprocessors (MPUs) and microcontrollers (MCUs), off the shelf graphics processing units (GPUs), FPGAs, and custom system on chip (SOC) devices. In order to determine which method to use, application requirements and considerations need to be carefully examined.
For example, when researching sophisticated technologies such as 5g base stations, designers need to consider that the basic standards and protocols are still evolving. This means that designers need to be able to quickly and effectively respond to any specification changes beyond their control.
Similarly, they need to be able to flexibly respond to future standards and protocol changes after the system is deployed to the site. In addition, they must be able to respond to unexpected errors in system functions or vulnerabilities in system security, modify existing functions or add new functions, so as to extend the service life of the system.
Although the highest performance is usually provided by SOC, this method is expensive and time-consuming. In addition, any algorithm implemented in the chip architecture is essentially “frozen in silicon”. In view of the above considerations, this inherent inflexibility becomes a problem. An alternative route is needed to find the optimal balance between high performance and flexibility. This route is often provided by FPGA, the combination of microprocessor / microcontroller and FPGA, or FPGA with hardware processor core as part of the structure.
What is FPGA?
This is a difficult question to answer, because FPGA is different for different people. Moreover, there are many types of FPGA, and each type has different capabilities and function combinations.
The programmable structure is the core of any FPGA (i.e., the defining aspect of “FPGA DOM”) and is presented in the form of a programmable logic block array (Fig. 1a). Each logical block is a collection of multiple elements, including lookup tables (LUTS), multiplexers, and registers, all of which can be configured (programmed) to perform operations as needed (Figure 2).
Figure 1: the simplest FPGA only contains programmable structure and configurable GPIO (a). Different architectures are formed by adding other components to this basic structure: SRAM block, PLL and clock manager (b), DSP block and SerDes interface (c), and hardware processor core and peripherals (d). (image source: Max Maxfield)
Figure 2: each PLB is a collection of multiple elements, including lookup tables, multiplexers, and registers, all of which can be configured (programmed) to perform operations as needed. (image source: Max Maxfield)
Many FPGAs use 4-input LUTS, which can be configured to implement any 4-input logic function. In order to better support the wide data path adopted by some applications, some FPGAs provide 6-input, 7-input or even 8-Input LUTS. The output of the LUT is directly connected to one of the logical block outputs and one of the multiplexer inputs. The other input of the multiplexer is directly connected to the logic block input (E). The multiplexer can be configured to select one of the inputs.
The output of the multiplexer is fed into the register input. Each register can be configured as an edge triggered trigger or level sensitive latch (however, it is not recommended to use asynchronous logic in the form of a latch inside the FPGA). The clock (or enable signal) of each register can be configured as high level active or low level active. Similarly, the effective level of the set / reset input is configurable.
These logic blocks can be regarded as “programmable logic islands” floating in the “sea of programmable interconnection”. This interconnection may be configured to connect any output of any logical block to any input of other logical blocks. Similarly, the main input of FPGA can be connected to the input of any logic block, and the output of any logic block can be used to drive the main output of the device.
The main general purpose input / output (GPIO) is presented in groups, which can be configured to support different interface standards, such as lvcmos, LVDS, LVTTL, HSTL or SSTL. In addition, the input impedance is also configurable, and the output voltage swing rate is also configurable.
The FPGA architecture can be further extended to include SRAM blocks (called block RAM (Bram)), phase locked loops (PLLs), and clock managers (Fig. 1b). In addition, digital signal processing (DSP) blocks (DSP slices) can be added. They include a configurable multiplier and a configurable adder capable of performing multiply accumulate (MAC) operations (Fig. 1c).
High speed SerDes block is another common feature of FPGA, which can support Gigabit serial interface. It is important to note that not all FPGAs support all of these features. Different FPGAs provide different feature sets for different markets and applications.
The programmable structure in FPGA can be used to realize any required logic function or function set, up to the processor core or even multiple cores. If these kernels are implemented in a programmable structure, they are called “soft kernels”. In contrast, some FPGAs (commonly referred to as SOC FPGAs) contain one or more “hard core” processors that are implemented directly in silicon (Figure 1D). These hard processor cores may include floating point units (fpus) and l1/l2 caches.
Similarly, peripheral interface functions (such as can, I2C, SPI, UART and USB) can be implemented as soft cores in programmable structures, but many FPGAs implement them as hard cores in silicon chips. The communication between processor core, interface function and programmable structure is usually realized by high-speed bus such as AMBA and Axi.
The first batch of FPGAs were launched by Xilinx in 1985 and only included an 8 x 8 programmable logic block array (without ram blocks, DSP blocks, etc.). In contrast, today’s high-end FPGAs can contain hundreds of thousands of logic blocks, thousands of DSP blocks, and megabit (MB) ram. In general, they may contain billions of transistors, equivalent to tens of millions of equivalent gates (such as 2-input NAND gates).
Alternative configuration technology
In order to determine the function of the logic block and the wiring of the interconnection, it is necessary to use the configuration unit, which can be represented visually by a 0/1 (off / on) switch. These units are also used to configure GPIO interface standard, input impedance, output voltage swing rate, etc. According to the specific FPGA, these configuration units can be implemented using one of the following three technologies:
Anti fuse: these configuration units are one-time programmable (OTP) units, which means that once the device is programmed, it cannot be withdrawn. Such devices are often limited to space and high security applications. Its sales volume is very small, so the price is very high, which can be described as an expensive design choice.
Flash memory: like anti fuse based configuration units, flash based units are nonvolatile. Unlike the anti fuse unit, the flash unit can be reprogrammed as required. The flash memory configuration unit can withstand radiation, so these devices are suitable for Space Applications (but the upper metallization layer and packaging should be modified).
SRAM: when this method is adopted, the configuration data is stored in the external memory. Each time the FPGA is powered on, the data is loaded from the memory (or in the case of dynamic configuration, the data is loaded according to the instructions).
For FPGAs with configuration units based on anti fuse or flash memory, the advantage is that they are “instant on” with low power consumption. One disadvantage of these technologies is that they require additional processing steps in addition to the basic CMOS process used to create the rest of the chip.
For the FPGA based on SRAM technology, the advantage is that it is manufactured using the same CMOS process as the rest of the chip and has higher performance, because it is usually one or two generations ahead of anti fuse and flash memory technology. The main disadvantage is that the SRAM configuration unit consumes more power than the anti fuse and flash memory units (of the same technology node) and is prone to single event upset (SEU) due to radiation.
For a long time, the latter disadvantage has led to the fact that FPGA based on SRAM is not suitable for aerospace applications. Recently, the industry has adopted a special mitigation strategy, making the FPGA based on SRAM and the FPGA based on flash memory appear on the Mars rover curiosity and other systems.
Flexibility with FPGA
FPGA is suitable for a variety of applications, especially for intelligent interface functions, motor control, algorithm acceleration and high-performance computing (HPC), image and video processing, machine vision, artificial intelligence (AI), machine learning (ML), deep learning (DL), radar, beamforming, base station and communication.
A simple example is to provide intelligent interfaces between other devices using different interface standards or communication protocols. Consider an existing system in which an application processor is connected to camera sensors and display devices using legacy interfaces (Figure 3a).
Figure 3: FPGAs can be used to provide intelligent interfaces between other devices using different interface standards or communication protocols, thereby extending the life of existing designs based on legacy devices. (image source: Max Maxfield)
As another application example, consider some computationally intensive tasks, such as signal processing for radar systems or beamforming in communication base stations. Conventional processors based on von Neumann or Harvard architecture are very suitable for some tasks, but they are not suitable for tasks that need to repeat the same sequence of operations. This is because a single processor kernel running a single thread can execute only one instruction at a time (Figure 4a).
Figure 4: the microprocessor can only execute one instruction at a time (sequentially). Unlike this, multiple function blocks in the FPGA can be executed simultaneously (concurrently). In addition, FPGA can implement appropriate algorithms in large-scale parallel mode. (image source: Max Maxfield)
In contrast, FPGA can perform multiple functions at the same time, and support a series of operations in a pipelined manner, so as to achieve greater throughput. Similarly, the FPGA does not perform the same operations as the processor, for example, performs 1000 operations on 1000 pairs of data values, but instantiates 1000 adders in the programmable structure to perform the same calculations in a large-scale parallel manner in a single clock cycle (Fig. 4b).
Which manufacturers make FPGAs?
This is an evolving picture. There are two major manufacturers of high-end devices with the highest capacity and performance, Intel (which acquired Altera) and Xilinx.
Intel and Xilinx offer a wide range of products from low-end FPGAs to high-end SOC FPGAs. Another supplier that focuses almost entirely on FPGA is lattice semiconductor, which is aimed at medium – and low-end applications. Last but not least, microchip technology (through the acquisition of Actel, ATMEL and MICROSEMI) now provides a variety of small and medium-sized FPGA and low-end SOC FPGA products.
Due to the large number of product families, each family provides different resources, performance, capacity and packaging style, so it may be difficult to select the best device for the task at hand. Here are some examples: Intel devices; Lattice semiconductor device; And Xilinx devices.
How to design with FPGA?
The traditional FPGA design method is that engineers use hardware description languages such as Verilog or VHDL to capture the design intent. First, these descriptions can be simulated to verify whether they meet the requirements, and then transferred to the synthesis tool to generate the configuration file for configuring (programming) the FPGA.
Each FPGA supplier either has its own internally developed tool chain or provides customized tool versions from professional suppliers. In either case, these tools are available from the FPGA vendor website. In addition, mature tool suites may be available in free or low-cost versions.
To make FPGAs easier for software developers, some FPGA vendors now offer advanced synthesis (HLS) tools. These tools parse algorithmic descriptions of desired behaviors captured in high-level abstractions in C, c++ or OpenCL, and generate input to provide to lower level synthesis engines.
For designers who want to get started, there are many development and evaluation boards available, each of which provides different functions and features. Here are three examples: dfr0600 development kit of dfrobot, which has zynq-7000 SOC FPGA of Xilinx; Terasic Inc.’s de10 nano, which has Intel’s cyclone V SOC FPGA; Ice40hx1k-stick-evn evaluation board with low power ice40 FPGA of lattice semiconductor.
If designers plan to use FPGA based PCIe daughter cards to accelerate applications running on X86 motherboards, they can pay attention to alveo PCIe daughter cards and other products, which are also provided by Xilinx.
The best design solution is often provided by FPGA, the combination of processor and FPGA, or FPGA with hardware processor core as part of the structure.
FPGA has developed rapidly over the years, which can meet the design requirements of flexibility, processing speed, power consumption and so on, and is suitable for a wide range of applications.
Reviewed and edited by: Fu Qianjiang