There are two implementations of on-chip memory resources in FPGAs: fine-grained and coarse-grained. The so-called fine-grained means that each basic logic unit can be configured as a small memory. Several small memories are expanded by merging. It requires no additional logic, but has a lower storage density and is suitable for applications with limited storage requirements. The coarse-grained type is to embed a large-capacity memory module into the FPGA chip as a dedicated storage unit. Compared with the fine-grained type, it has the advantage of high storage density and is suitable for data processing and other situations that require a large amount of on-chip storage space. With the increasing application of FPGAs and the increasing demand for large-capacity storage, embedded memory modules have become a very important resource in FPGA chips. And they have more flexible configurability than normal memory.
The memory module designed in this paper is a part of our FPGA chip, and its function, structure and layout serve the entire chip. It is a synchronous 18Kb dual-port memory based on 0.13 micron CMOS process, which can be configured as ROM or SRAM. Each port supports 6 data widths and 3 write modes, and the polarity of the control signal can be selected. Each output port is independently set to 0 / set to 1. In applications, multiple memory modules can be combined to achieve depth or width expansion, and can also be used as FIFOs or large look-up tables.
2. Design of memory module
2.1 Hierarchical structure
From the perspective of the FPGA chip, the circuit is divided into a logic layer and a configuration layer, as shown in Figure 1. The logic layer is a static memory with two independent ports A and B. en is the chip select signal, we is the read-write control signal, and ssr is the step-by-step preset control signal. The role of the configuration layer is to provide configuration signals for the logic layer to select the configuration mode of the storage module. Each configuration signal corresponds to a 6-tube configuration unit of the configuration layer, which is assigned a value in the FPGA initialization phase and then sent to the logic layer.
2.2 Storage unit
The memory cell adopts the 8-tube dual-port structure shown in Figure 2(a), and each port corresponds to a word line and a pair of bit lines. When the potential of the word line is pulled high, the corresponding two NMOS transistors are turned on, and data is written or read through the bit line. When used as a ROM, in order to realize the initialization of the storage unit, a data channel from the configuration layer to the storage unit must be provided. Our implementation is shown in Figure 2(b), that is, adding word line and bit line selectors to the A port. awl_lgc and abl_lgc are word lines and bit lines with A terminal I=I in the logic layer, and cfgwl and cfgbl are word lines and bit lines from the configuration layer. When the mode selection signal modesel is at a low potential, the word line and the bit line of the configuration layer pass through, and the initialization of the memory is completed. On the contrary, if the word lines and bit lines of the logic layer pass through, the memory is a common static memory.
Figure 1 Memory Module Hierarchy
Figure 2 Memory cell design
Figure 3 Schematic diagram of three write modes
The memory block has three write modes, corresponding to the three states of the output ports during a write operation (Figure 3). Read_First indicates that before writing new data, the old data in the storage unit is read out, that is, the two-step operation of reading first and then writing is completed in one clock cycle; Write_First indicates that the written data is also the read data; Nochange Indicates that the output port state remains unchanged during the write operation. Read First is the default mode, and the key to its implementation lies in the precharge circuit described below.
Because both read and write operations must pass through the bit line, in order to realize the function of reading first and then writing in one clock cycle, reading and writing must be performed in two different time windows 121 . For this purpose, the precharge circuit as shown in Fig. 4(a) is designed. dw and dwn are the data to be written, which are generated by the input signal through a certain logic; dr and dm are the data to be read, which are sent to the output after being processed by circuits such as a sense amplifier. yi is the precharge control signal, rdctl is the read control signal, and wtcd is the write control signal, and their timing relationship is shown in Figure 4(b). When yi is low, both bit lines are pulled to VDD, and both read and write operations are turned off. The high-level window 121 of vi is about 0.5ns. During this period, the read operation is performed for half the window time (rdcfl is pulled down), and then the write operation is performed with the second half of the window time (wIctl is pulled down). In this way, the write path is closed when reading, and the read path is closed when writing, thereby realizing the purpose of reading old data first and then writing new data.
Figure 4 Precharge circuit
Figure 5 Data width selection circuit
2.4 Data width selection circuit
Through the circuit shown in Fig. 5, the memory module can realize the choice of 6 kinds of data widths (16Kxl, 8Kx2. 4Kx4. 2Kx9, 1Kxl8, 5 12×36). Among them, each multiplexer (MUX) is controlled by an address signal, which plays the role of address decoding. When writing data, the bus selection array selects the required data from 36 input data, and writes the corresponding storage unit through the address decoding of the multiplexer. When reading data, each multiplexer either outputs the required data or maintains a high-impedance state, and these data enter the bus selection array and send to the corresponding output port. All data width modes multiplex 36 input ports and 36 output ports, and the ports occupied by different modes are different. For 512×36 mode, all ports are utilized, but for modes with width less than 36, there must be redundant ports. In order to avoid circuit instability that may be caused by floating these unused ports, they will be automatically connected to VDD or GND when in use.
3. Verification methods and results
Due to the large number of signals and various combinations of working modes of the memory module designed in this paper, we use the behavior-level simulation tool Modelsim and the transistor-level simulation tool Hsim to co-simulate to verify it. Using behavior-level simulation can easily observe whether the function of the circuit is realized, and speed up the verification progress, especially when the verification circuit is extended to the memory array or even the entire FPGA chip. And through transistor-level simulation, detailed and accurate timing parameters, such as rise/fall time, delay, etc., can be obtained. Figure 6 is the simulation waveform of the two write operation modes of A port Read_First and Write in Modelsim, from which the characteristics and differences of the two can be clearly seen.
Figure 6 Simulation waveform
Access time is an important indicator of memory performance. We select the memory cell located at the top of the bit line structure as the critical path for read and write operations, and measure the delay between the valid time of the data at the output and the rising edge of the clock. Because the lengths of the paths that the signals travel through in different data width modes are different, their delays must also be different. The simulation results show that the shortest delay time is 1.75ns in 512×36 mode, and the longest is 2.7ns in 16Kxl mode, which is consistent with the theory.
4. Layout implementation
We have completed the core layout part of the module in a fully customized way, as shown in Figure 7, and call it the memory core. The memory cell array is divided into two parts, separated by a decoder and control circuit in the middle. The figure shows the relative positions of the main functional modules. According to the planning of the whole FPGA chip, the allocation scheme of metal layers is as follows: all logic circuits use 1 to 4 layers of metal, and 5 and 6 layers of metal are dedicated to the word lines and bit lines of the configuration layer.
The memory module will eventually be applied to a series of FPGA chips. In order to be seamlessly spliced with the surrounding channel modules, the specific chip structure has specific requirements on the port positions of the memory modules. Therefore, the above-mentioned memory cores need to be wired and packaged according to chip parameters, and the signals connected to the surrounding modules should be led to corresponding positions. In order to improve efficiency, we use Synopsys’ automatic place and route tool Astro to complete this routing work. Use script commands to read the necessary parameters from the file that records the chip structure to generate the files required by Astro. After the automatic wiring is completed, a complete layout suitable for a specific chip is obtained, and the entire process is fully automated. The dark part in Figure 8 is a layout generated after automatic routing, surrounded by schematic diagrams of FPGA channel modules (cbx, cby, sb).
Figure 7 Memory core layout
Figure 8 A complete layout after packaging
This paper introduces the design and implementation of an embedded memory module in FPGA under a 0.13-micron CMOS T technology. The module has two independent ports that can be configured as ROM or SRAM, supporting 6 data widths and 3 write modes. The behavior-level and transistor-level co-simulation methods are used to verify that the circuit performance is good. The fully custom designed memory core is packaged by an automatic placement and routing tool to obtain a complete layout suitable for a specific chip.
The author’s innovative point of view: the word line and bit line selection circuit designed in this paper realizes the initialization function of the configuration layer to the memory cell; the unique precharge circuit realizes the pipeline mode of reading first and then writing in one clock cycle; using automatic layout and routing tools The method of packaging the fully customized layout has certain inspiration for the design of embedded modules.
Responsible editor: gt