Three-level cell (TLC), multi-level cell (MLC), single-level cell (SLC), pseudo-single-level cell (pSLC) – there are many different flash technologies. Developers need to understand the basic mechanics of storage and the role of the storage layer in order to know which flash products are best for a specific application. Only then will they know what questions to ask the supplier.

Different aspects of the application can determine the choice of memory modules for embedded industrial systems. These include read and write speeds, endurance (or the lifespan of flash media), retention (the lifespan of stored data), data security in the event of a power failure, temperature and vibration resistance, long-term product availability, and more. Aging of NAND chips is a flash-specific effect that also plays an important role in the decision-making process.

The cells of a NAND flash device can only survive a limited number of block erase cycles. In the oxide layer that separates the storage gate, electrons with higher energy levels (hot electrons) are trapped after being accelerated by the programming voltage. In due course, this changes the threshold voltage until the cell is no longer readable (Figure 1).

Figure 1. Aging cell: Electrons accumulate in the tunnel oxide layer, gradually changing the threshold voltage value. Cracks in the tunnel oxide create leakage current paths that allow charge to leak. Read errors increase until the entire block is rejected as a "bad block".

Flash memory aging – when will it end?

There is also a second aging effect: the formation of conductive paths through the oxide layer. This causes the cell to gradually lose its state of charge and thus the storage bit.

High temperatures greatly magnify this effect. Studies using 25 nm MLC NAND devices show that after five years at 55 °C, retention drops to approximately 75%. At 85 °C (relatively mild growth), retention drops below 10%.

Furthermore, this effect becomes stronger over time as the cell gets closer to its maximum program-erase cycle (P/E cycle). The implications for retention here are huge. For example, a low-cost MLC NAND flash device has a raw retention capacity of 10 years, which may drop to about a year after reaching 3,000 P/E cycles.

Likewise, the state-of-charge and threshold voltage challenges of low-cost TLC NAND flash chips require eight different distinguishable charge levels to write three bits per cell. Degradation effects were more pronounced in these designs, with the original retention schedule dropping from one year to three months after only 500 P/E cycles.

While more expensive SLC devices experience the same degradation, about 100,000 P/E cycles must occur before these effects can occur. The significantly higher P/E cycle tolerance is a big reason why SLC remains the flash technology of choice for industrial applications despite its higher cost.

Cost Compromise: pSLC

The pSLC process was introduced to balance cost with the realization that a reduction in the number of different charge levels makes data storage on NAND chips more robust. Compared to SLC, pSLC used a more cost-effective MLC chip for the first "strong" bit of each cell, and achieved some stunning results. The pSLC mode is significantly faster than the standard process for MLC flash and increases the number of P/E cycles before degradation from 3,000 to 20,000. Under the same conditions, the data endurance is increased by 6.7 times, while the cost per bit of storage is only twice (Figure 2)

Figure 2. Endurance comparison of SLC, pSLC, and MLC NAND flash technologies.

Endurance specs note: workload is decisive

Developers need to know exactly what the manufacturer's specifications represent when choosing a storage device. Two measurements specifically indicate SSD endurance: terabytes written (TBW) and drive writes per day (DWPD). TBW represents how much data can be written in total during the lifetime of the device, while DWPD represents the maximum amount of data that can be written per day during the warranty period.

The challenge with sometimes exhaustive manufacturer specifications is that developers cannot determine whether they make any sense for the application in question. Specification values ​​are highly dependent on the type of workload during testing. For example, Swissbit's 480 GB SSD showed endurance of 1,360 TBW, 912 TBW, and 140 TBW, depending on the measurement procedure. Sequential writes yielded the strongest value of 1,360 TBW, while Client Workload and Enterprise Workload accounted for the second and third values, respectively. The client workload is based on the behavior of PC users who generate primarily sequential data access operations, while the enterprise workload simulates the behavior of a multi-user server environment where 80% of the data is accessed randomly.

Such durability testing is based on guidelines developed by the JEDEC standardization organization and helps ensure product and manufacturer comparability. However, workload specifications are usually not included in data sheets. Many manufacturers are happy to advertise the amazing endurance values ​​based on sequential writes used in only a few applications. As shown in the example above, the endurance values ​​for flash solutions can easily differ by a factor of 10 for sequential writes and enterprise workloads. Buyers must proceed with caution.

Stress Flash

Erasing accelerates the aging of memory cells; however, for writing, block erase is required. This can lead to the deceptive conclusion that in read-only applications, such as boot media, the data is safe in the long term due to extended data retention times. Unfortunately, this is a misunderstanding. There are other conditions that can cause read errors and, indirectly, wear out of NAND cells.

During each write, cells adjacent to the cell being programmed are stressed. These show slightly increased voltages called "program disturbances". Reading causes stress as well as "read disturbances" where adjacent pages collect voltage. Over time, the electrical potential stored in these cells increases. This causes a read error, which disappears again after the block is deleted. Reading is less effective than writing due to the lower voltage, but bit errors can still occur. These are compensated in Error Correcting Code (ECC) by deleting blocks.

However, developers still have to take into account that the effect is particularly strong in applications that repeatedly read the same data, which means that even in storage media used only for reading, blocks must be deleted and pages written periodically as errors Some corrections. As a result, the medium has also grown old.

internal concerns

This moderate aging causes "internal problems" in flash memory devices. Erase, write, and read are not only triggered by the actual application, but also by numerous controller and firmware processes. What happens here is often overlooked, but again affects performance factors like speed and stamina.

In addition to error correction, another of these internal mechanisms is wear leveling. When a unit fails, the complete block must be marked as "bad block". For durability purposes, it is important to delay this failure if possible. This can be achieved using wear leveling, which is an even distribution of physical memory address usage. Another internal mechanism is garbage collection, i.e. re-copying of freed blocks.

These processes first complement the mechanism that makes data storage possible: the mapping between logical and physical addresses. The efficiency of a flash media controller is measured by the ratio of user data from the host to the actual data value written to flash. This is expressed using the Write Amplification Factor (WAF).

Reducing WAF is one of the keys to longer battery life. There are workload factors that affect a WAF, such as the difference between sequential and random access, the size of data blocks associated with pages, and the block size itself. Therefore, the firmware also determines whether the flash media is suitable for the application.

How Manufacturers Can Improve Efficiency

For better understanding, here is another discussion of how flash memory works. Pages of a cell block must be programmed contiguously, but only complete blocks can be deleted. In the standard process, the mapping between logical and physical addresses is referred to as a block. This works very well for sequential data, since the pages of a block are written consecutively. Continuously collected video data is one of the ideal application examples based on block allocation.

This is different for random data. Here, pages are written in many different blocks. This means that for each internal reprogramming, a complete block must be deleted per page. Therefore, the WAF is high and the durability is reduced. Therefore, page-based mapping is more suitable for non-sequential data. Here, the firmware ensures that data from different sources is kept sequentially on the pages of a block. This reduces the number of deletes – which has a positive effect on durability – and improves write performance. However, page-based mapping increases the allocation table of the Flash Translation Layer (FTL). Manufacturers compensate for this by integrating DRAM. So the benefits of page-based mapping are not without results.

Abundant over-provisioning as a quality feature

Page-based mapping is also beneficial if the level of use of the data medium forces the WAF up. The more data stored on the flash media, the more bits the firmware has to move back and forth. Manufacturers can prevent the problem of overloading the data medium by simply over-provisioning. This refers to an area of ​​flash memory reserved for internal activities only. By convention, this is 7% of the total area, and by the gigabyte specification, there is a difference between decimal and binary values.

If you set aside 12% for overprovisioning instead of 7%, it can have a surprising effect. For an endurance comparison (TBW of enterprise workloads) of two SSDs with the same hardware, the 240 GB Swissbit model X-60 durabit has 12% of its area reserved for overprovisioning, which is almost double the value of the 256 GB model. If you look again at the impact of DRAM on endurance, the difference for the 240 GB durabit version is 10 times that of the 256 GB standard version (Note: As already achieved when using MLC as pSLC, a significant positive endurance effect can be achieved by the above memory capacity or by applying over-provisioning).

data maintenance

Error correction and wear leveling are mechanisms also used in general-purpose flash products. For high-quality industrial SDD or flash memory cards, manufacturers go the extra mile to prevent data loss and system failure. Therefore, the combination of different mechanisms such as ECC monitoring, read disturb management and automatic read refresh ensures that all stored data is monitored and refreshed on demand. This allows system failures to be prevented in advance. (Note: Data integrity should be ensured without involving the host application. This allows the process to run autonomously within the memory card – not just when there are accumulated bit errors following a read request by the host application, which is usually the case ).

Therefore, advanced data care management searches are independent of application requests for potential errors (Figure 3). To this end, all written pages, including the firmware and FTL's allocation tables, are read in the background and flushed as needed. There are multiple triggers for this preventative error correction, including a defined number of repeated switches, number of P/E cycles, amount of read data, read repeat/reread, and elevated temperature.

Figure 3. Data care management counteracts gradual loss of data. So all written blocks are read in the background and in case of copying, repairing and rewriting too many bit errors.

know what to choose

Understanding the different aspects of flash technology is key to choosing the best storage solution for industrial applications. Of course, criteria such as power-failure protection mechanisms, particularly robust handling, and specifications for extended temperature ranges should also be considered.

The long-term availability of modules specified for specific applications is also important. That's why a type of flash memory — 3D NAND — doesn't show up here at all. The technology is still too new to ensure long-term availability, and innovation cycles and design changes are still too temporary for industrial product life cycles.

Ultimately, the endurance and data retention experience of these NAND flash devices is critical when choosing an industrial storage technology. Optimizing these values ​​is a key task for manufacturers of industrial flash products, and customers should press these numbers before purchasing.

Reviewing Editor: Guo Ting

Tagged:

Leave a Reply

Your email address will not be published.