In order to maintain acceptable performance characteristics at the user level, solid state drive (SSD) storage system designers must develop complex architectures and algorithms to address the inherent limitations of NAND flash memory. These workarounds have led to fast, reliable memory solutions that have successfully powered storage systems for decades, but not for long. That’s why.

In addition to running counter to the trend of the industry towards smaller and simpler technologies, the complex workaround system also affects the overall performance and cost, and also causes major system bottlenecks. With the continuous reduction of technical nodes, it is expected that it will only deteriorate

For example, when it is reduced to less than 25nm, the durability and reliability of NAND flash memory will be seriously reduced, so that the current solution is almost useless (Figure 1). Such performance indicators have prompted the entire industry to compete to develop more powerful non-volatile memory (NVM) solutions to meet the capacity, performance, power and reliability requirements of the next generation of electronic devices by simplifying the working mode of memory.

So, what hinders NAND flash memory? Design constraints. Inherent design constraints. This article will discuss the challenges facing NAND flash memory as manufacturers try to downsize, especially in solid-state drives, and cover emerging memory technologies that will change the NVM market.

poYBAGJ81DaAXL2PAAG4MRQMHb8552. png

Figure 1 trends in NAND durability and bit error rate (BER) – as technology nodes decrease, NAND durability cycles decrease and BER increases. Durability measures how many cycles a storage unit can withstand before it becomes error prone and unusable. BER measures the bit error rate of each memory array.

NAND flash memory design constraints

In the recently developed flash based SSD, memory access is managed by high-end memory controller chips connected to DRAM buffers and multiple original NAND flash components. Although most people who are proficient in technology know the limitations of NAND flash technology, an in-depth study of existing solutions can illustrate how they affect SSDs and the entire system. These characteristics are summarized in Table 1.

poYBAGJ81EaAP3zAAAHD8srt5zQ798. png

Table 1 Summary of NAND features and storage system related solutions.

Block Erasure

NAND flash technology can only erase blocks and can only program pages. It cannot modify a fully programmed block at any granularity (bytes, pages, or blocks) without erasing the entire block. This is a design constraint, which increases the overall complexity through the following solutions: data replication, logical to physical mapping table (l2p), buffering, and garbage collection.

Data replication: for data modification, the NAND system controller must first read the data into a temporary memory location (such as DRAM), then merge the read data with the modified data when necessary, and finally rewrite the modified data to a new page (Figure 2).

L2p mapping: therefore, each time this procedure is performed, the controller must update and maintain the l2p mapping table. L2p saves the original and modified data locations, and guides the host to access and execute the data management process. The larger the capacity of the storage device, the larger these tables need to be. Therefore, most controllers must use external DRAM to maintain these large tables.

Garbage collection: revise obsolete pages, also known as obsolete data, which cannot be erased or overwritten; Instead, they are released by another controller initiated solution called garbage collection. Figure 2 illustrates the data revision process after the garbage collection process.

pYYBAGJ81FaAc3nlAAaWpeANgnY831. png

Figure 2 garbage collection process – 24 page writes occur to rewrite 8 pages, which means that the write amplification (WA) is equal to 3, which is three times the ideal efficiency measure.

Impact on write amplification

It is important to note that the controller in this example performs 24 page writes to complete the expected rewrite of 8 pages.

Write amplification (WA) measures the efficiency of the controller by defining the number of writes per expected write from the host to the NAND by the controller. Wa stands for ideal efficiency – write to NAND device once per host write. Most systems typically have a wa between 3 and 4. A higher wa has a direct impact on the reliability and performance of the storage device, because it will increase the number of writes to the device, so that the unit can reach its maximum cycle faster. This is particularly important for smaller technology nodes, where the maximum cycle of a storage unit drops below 3000 (see Figure 1).

In the example shown in Fig. 2, the write magnification can be defined by taking the total number of pages (24) in the block and dividing it by the number of pages released (8). Therefore, in this case, WA is 3.

Low program / erase cycles – low durability

An inherent feature of NAND flash memory is the low durability of its cells, which is defined by the maximum programming / erasing cycles that the cells can experience before becoming unreliable. Nevertheless, NAND flash memory can still maintain the system life through wear balancing and bad block management, both of which will increase the controller and performance overhead and cost.

Wear Equalization: the wear equalization algorithm makes the number of cyclic blocks as uniform as possible, independent of the host operating system and file system. If there is no wear balance, some storage units will accumulate high cycles, as shown in Figure 3, thus shortening the life of the storage system. The system controller must support this mandatory process to extend the product life, which will increase the calculation and management overhead of the controller.

pYYBAGJ81GOARl9SAAHK6-LizGg845. png

Figure 3 wear balancing – implement a wear balancing algorithm to improve and maximize the durability and service life of the storage system.

Bad block management: when a block reaches its maximum period, the data may be damaged due to “floating gate to floating gate coupling” charge leakage or read / write interference. Bad block management performs write verification to find the failed sector. If any errors are found, bad block management maps them to prevent more data from being stored in these blocks — essentially eliminating them. On smaller nodes, when the maximum cycle has decreased at an alarming rate, blocks will need to exit as soon as possible, and valuable memory space will be consumed to store and track the mapping of these blocks.

High bit error rate (BER) from low reserved NAND flash memory

There is a tendency to make mistakes, which is measured by BER. To monitor and correct errors, NAND flash memory uses error correction codes (ECC). However, on smaller nodes, with the increasing BER, the complexity of ECC must also increase. Figure 4 illustrates that 20nm NAND flash memory needs to perform more than 40 ECC corrections on 1KB data, thus exponentially increasing the complexity of ECC algorithm, the allocation space required to store ECC words, and the controller overhead required to process related complexity.

pYYBAGJ81G6AY0qnAAHCmJzgAa4600. png

Figure 4 ECC and BER – 20nm flash memory requires more than 40 ECC corrections for 1KB data.

Larger ECC: in the 20nm NAND flash memory array, the overhead area of the silicon memory used to store ECC parity bits increases the total array size by 10%. This increase in array size is particularly relevant to the requirements of next-generation storage systems, because it shows that NAND has limited ability to adapt to the trend of achieving greater memory density on smaller technology nodes in the future.

More powerful ECC: in addition, with the increasing capacity of NAND flash memory, its reliability continues to decline. Traditional ECC, such as the commonly used BCH code, has become increasingly unsuitable for SSDs. In order to effectively improve the reliability of SSD, a more powerful ECC, such as LDPC code, is required.

However, compared with BCH processing, LDPC implementation requires more powerful and complex processing and more transistors, as shown in Figure 5. Although LDPC code decoding using hard decision can achieve significant coding gain compared with traditional BCH code, soft decision can significantly improve the error correction strength of LDPC code decoding. Reading and processing soft information from NAND can lead to unpredictable read response time of the storage system, which is an unwanted side effect in enterprise applications.

pYYBAGJ81HyAWlAIAAGcLlWnNc0855. png

Figure 5 gate count – low density parity check codes require more transistor gates than conventional bchs.

Page read

Slow another inherent problem with NAND flash based storage systems is that page reads are 50% slower μ s。 This latency is not sufficient to support enterprise storage systems and real-time embedded memory applications. Such systems require a read access time of less than 100ns. So far, NAND flash has no solution. The read current of NAND memory cell is very low, less than 300na. The current memory architecture design cannot provide fast random read operation.

NAND flash impact on storage controllers

Due to these design complexities, NAND based storage controllers are not only larger, but also must use more behind the scenes memory at the expense of actual “working memory” or end-user memory. The reasons are as follows:

ECC: ECC blocks must be much larger because NAND has a higher BER on smaller technology nodes.

Buffer: you need to increase the buffer to maintain the l2p table and increase the data replication process.

DRAM: external DRAM is usually used to maintain large l2p tables.

Multi core central processing unit (CPU): most high-performance storage controllers use multi-core CPUs to handle garbage collection and wear balancing algorithms, and manage l2p tables and NAND devices across multiple channels.

Increase CPU bandwidth: most of the CPU bandwidth is used to regularly store tables in NAND to prevent accidental power interruption. These meters must be fully restored after power failure… Otherwise data will be lost.

Compression engine: the compression engine is used to reduce write amplification by reducing the actual host data written to NAND.

Emerging technology: resistive RAM

Due to these declining trends and the obstacles faced by NAND flash memory, storage system manufacturers have realized the importance of solving storage system problems through new technology breakthroughs that are not subject to the scalability problems and design constraints of flash memory technology.

After years of in-depth research and development, one of the most promising candidates is usually considered as resistive RAM (RRAM).

poYBAGJ81IqANbOCAAIIKbnRVAs535. png

Table 2 Comparison between current technologies and emerging technologies, including the high-performance RRAM type called a-Si RRAM.

How RRAM works

A typical device consists of two metal electrodes sandwiched in a thin dielectric layer as ion transport and storage media (Fig. 6).

There are significant differences in the exact mechanism between the different materials used, but the common link between all RRAM devices is that the electric field or heat will lead to ion movement and local structural changes in the storage medium, which will lead to measurable changes in device resistance.

pYYBAGJ81JaAL1Q0AAFS_ d628xs666.png

Figure 6 typical RRAM unit in cross switch architecture

poYBAGJ81KKADqZQAAGCeKCLfic894. png

Figure 7 working principle – in a switching medium, nanoparticles form a conduction path between the top and bottom electrodes.

Although several types of RRAM technologies are under development (see sidebar), the most common challenges faced by RRAM technologies are temperature sensitivity and CMOS incompatibility.

One type is a-Si RRAM, which uses commonly used amorphous films, such as amorphous silicon (a-Si), as the main material for forming filaments. The conductive “filaments” produced during resistance switching consist of discrete metal particles rather than continuous metal plugs found in other RRAM methods. These features bring many performance advantages and are expected to eliminate many problems faced by flash. Crossbar, Inc., a California based company, has successfully developed demonstration products using this technology (Figure 8).

pYYBAGJ81K6ALChPAAGkg4YB-wg081. png

Figure 8 integrated device RRAM products of crossbar

poYBAGJ81LyALpw1AAG2Ld4kglU285. png

Table 3 Comparison of common RRAM types

The main features of a-Si RRAM technology display are

What makes a-Si RRAM such a promising candidate? For the same reason that NAND flash cannot keep up with the reduction of technology nodes, a-Si RRAM can. Its inherent simplicity and compatibility make a-Si RRAM a precise design that supports next-generation technologies.

Scalability: a-Si RRAM can be reduced to 5nm nodes, which will keep up with the development of storage systems in the coming decades.

3D stackable and MLC capability: the very large R off /r on ratio (1000) can provide large sensing margin and support MLC (multi-layer cell) operation. Compared with NAND based memory technology, the combination of stackable memory and MLC memory cells can improve memory density and reduce the cost per bit.

Durability: it has high durability (10e10), and the cycle characteristics of a-Si RRAM are significantly better than those of NAND. This greatly reduces the need for wear balance and reduces the ECC requirements of the host controller, thereby improving the overall system performance and power consumption.

Reservation: crossbar, Inc. conducted and successfully passed the 10-year reservation test at 85 ° C for a-Si RRAM. It is expected that this technology will have excellent retention and BER compared with NAND.

High speed: because the cell current of a-Si RRAM cells is several orders of magnitude higher than that of NAND, the memory array provides faster page reading. Fast page reading enables faster random access, which is very suitable for enterprise storage memory and real-time memory systems.

Byte and page changeability: this feature greatly improves performance and reliability by eliminating the overhead of write amplification and garbage collection.

Breakthrough RRAM based storage solutions

The RRAM based SSD controller is not subject to many burdens brought by NAND flash to the storage system.

In RRAM based storage controllers, the CPU does not need to manage l2p tables or handle the same level of garbage collection and wear leveling. With these lower requirements, RRAM based storage controllers will be simplified and cost less than NAND based storage controllers.

Table 4 below lists the different unit level and product level characteristics comparing NAND and RRAM technologies. These characteristics show that a-Si RRAM based storage systems will provide excellent performance and reliability in emerging applications that require high performance, power or durability to achieve high capacity, high speed and low cost.

poYBAGJ81MuABIewAAIwMMs56kM333. png

Table 4 Comparison of NAND based and RRAM based performance profiles at the unit and product levels.

A word about design and CMOS compatibility

The reason why a-Si RRAM stands out from other emerging technologies is that it is very easy to integrate. Different from the materials used in many new technologies, the amorphous silicon films used in a-Si RRAM have good characteristics and robustness, and have been used in CMOS wafer factories. For example, the memory developed by crossbar, Inc. can be an independent array or embedded into the back end at the top of CMOS to form multiple 3D stacking layers.

Currently, RRAM can be manufactured in the back-end (BEOL) process because most RRAM cells operate independently of transistors. In a typical process basis, wafer manufacturing (including address and sensing circuits) will be manufactured in a CMOS foundry, and then RRAM memory will be manufactured in the same factory or a separate BEOL memory factory. Crossbar Inc. has conducted a number of tests to ensure that their products are CMOS compatible, and has demonstrated their memory arrays on a variety of integration scenarios using different toolsets.

Integration: the integration of a-Si RRAM involves patterning and material reduction etching processes. The process flow has repeated blocks to realize the stacking of storage elements. A-Si RRAM integration uses standard process steps and tools commonly used by various wafer factories.

In the past decades, NAND has put forward demanding tasks for system controllers in solid-state storage devices. These management tasks increase the system complexity, power consumption, the number of transistor gates and the overall storage system development cost.

Breakthrough features of a-Si RRAM technology, such as crossbar memory, provide high-performance specifications and flexible functions for storage devices, such as the ability to rewrite to storage locations without erasing blocks. Simplified devices simplify storage systems and significantly reduce system controller overhead, thereby supporting the creation of emerging technologies for future generations.

By hagop Nazarian and Sylvain Dubois

Leave a Reply

Your email address will not be published. Required fields are marked *