Over the next few years, tens of billions of Industrial Internet of Things (IIoT) devices will be connected to each other and generate vast amounts of data collected from sensors and applications. A significant portion of IIoT data will eventually be stored, processed, and even analyzed at the edge, requiring storage devices there to respond faster with high data integrity.

A major challenge for IIoT edge computing is the harsh environments that these systems will inevitably encounter, especially high-temperature environments. Unfortunately, there is a common misconception that by simply using off-the-shelf industrial-grade NAND components, storage systems serving IIoT devices will be able to operate reliably at the often extreme temperatures, which are sufficient for mission reliability— critical system. In practice, taking this approach may result in unacceptable levels of device performance and fault tolerance in NAND flash, as explained below.

Effects of NAND Characteristics, Die Shrinkage, and Extreme Temperatures

During manufacturing, lithographic node shrinking or “die shrink” tends to increase the number of defective dies, resulting in inconsistent quality of NAND flash modules and ICs. Fewer electrons stored per memory cell lead to an increase in the number of bit errors, which reduces data retention and durability.

Extreme temperatures can further degrade NAND flash memory and cause changes in the momentum of electrons in modules and ICs, leading to data retention issues and even data loss. For example, raw bit error rate (RBER) and early life failure rate (ELFR) are two phenomena caused by electron leakage or retention problems in the tunnel oxide of a memory cell. During program/erase (P/E) cycles, high temperatures can accelerate electrons in and out of the cell gate and make P/E easier, but at the same time, charge traps (trapped electrons) increase at the tunnel oxide. Over time, the release of these charges can cause a threshold voltage shift (Vt), resulting in bit flips and hold failures.

At the other extreme at low temperatures, the cell gate may end up with a lower charge, and the increased tunnel oxide degradation could lead to potential dielectric leakage despite improved data retention.

The only way to prevent such incidents from NAND flash devices is to pass a rigorous reliability testing program.

IC-level testing and product-level reliability demonstration testing for improved reliability

NAND Flash IC testing can be used to verify how Error Correcting Code (ECC) and temperature affect P/E endurance, data retention and longevity of NAND Flash devices. For example, different ECC levels per 1 KB of memory can be tested across temperature ranges in a reliability demonstration test (RDT) to determine a sufficient amount of ECC required for certain environmental factors.

For production-level testing, the same RDT process can be applied through burn-in testing for read/write quality assurance at temperatures from -40 ºC to +85 ºC, and the entire drive is evaluated block-by-block, including firmware, user area, and other memory spaces. Verified weak blocks can be filtered out and replaced with spare blocks to enhance the overall endurance of the NAND device throughout its life cycle, and further verification testing can verify signal integrity across the SATA interface.

ATP’s ITemp MLC NAND flash solutions have adopted this type of validation to support high product reliability and long-term product lifecycle requirements at harsh temperatures.

in conclusion

To achieve the reliability required for IIoT applications, general test methods for NAND IC components are not sufficient. Advanced RDTs for high/low temperature increase reliability, extend product life and reduce total cost of ownership. Are your storage solutions up to the task in harsh environments?

Reviewing Editor: Guo Ting

Tagged:

Leave a Reply

Your email address will not be published.