Most embedded systems today rely on on-chip flash or SRAM for program memory (firmware). However, these memories are often expensive in terms of cost and power consumption, and also limit program size to the amount of memory implemented on-chip. Recent advances in protocol and memory design enable firmware to run directly from off-chip non-volatile memory (NVM) devices in an execute-in-place (XiP) fashion—an approach that could potentially alleviate the challenges described above.

This article will describe the advantages and disadvantages of traditional solutions with embedded firmware. This will be followed by an in-depth discussion of XiP solutions, and finally recommendations on how to choose the best XiP architecture.

Embedded flash memory has long been the mainstay of microcontrollers (MCUs). These MCUs augment the on-chip volatile memory array with one or more NVM arrays that store firmware and other constants. Achieving this requires augmenting the base CMOS process of the NVM cell with a number of additional fabrication steps. In the past, the cost of adding these manufacturing steps was small, but as CMOS technology has advanced, it has become increasingly complex and expensive to convert a standard CMOS process to an NVM-enabled process. In fact, while the state-of-the-art CMOS processes in volume production today are under 10nm, CMOS with embedded flash is many generations behind at 40nm.

The result is that while MCU vendors can build products using process technologies that are faster, cheaper and require lower power consumption, the use of embedded NVM prevents them from doing so. Even if they choose an older generation with optional embedded NVM support, the price gap between the flash-enabled process and the non-flash-enabled process can be more than 40 percent. Also, committing a specific NVM size to the MCU may work for one application, but may be the wrong capacity for another.

Despite all these challenges, embedded NVM will be the mainstay of MCUs for a long time to come. For smaller designs that can be implemented in less advanced CMOS processes, using embedded flash will be the most efficient solution. But higher-performance, lower-power MCUs require alternative solutions.

Another way to use embedded flash is an on-chip SRAM array backed by an external serial flash device. At startup, the contents of the external flash memory are copied to the on-chip SRAM, and the MCU starts executing from the SRAM. The biggest advantage of this solution is that the SRAM can be fabricated using state-of-the-art CMOS processes without the need for process modifications. However, this solution requires two copies of the firmware – one in external flash and one in SRAM. Even in advanced process nodes, large on-chip SRAM arrays are quite expensive. Also, due to SRAM leakage current, they need to be shut down when the system is in power-down mode, requiring repeated power-hungry, time-consuming copy operations every time the MCU wakes up. Finally, as with embedded flash,

MCU vendors are looking for new memory architectures to meet the performance and power requirements of emerging smart IoT edge devices. The use of XiP is becoming the solution of choice for high performance, low power systems. With XiP, MCUs can be implemented in standard CMOS process technology, while only the external flash array requires a special NVM process. The MCU adds an instruction cache that holds frequently used code segments. Whenever the processor cannot find the required instruction in the cache (cache miss), the MCU initiates an access to external flash memory to bring the missing instruction into the cache. With the introduction of the new JEDEC xSPI protocol (JESD 251), the flash interface can go up to 200MHz with 8-bit wide data path switching in double data rate (DDR).

MCUs from NXP and ST that can use external flash memory for execute-in-place (XiP) operations. ST's STM32L4+ and STM32L5 are mid-range MCUs based on Arm Cortex-M4 and M33 cores, respectively, while other members of the high-end NXP i.MX RT1050 and RT10xx series implement 600MHZ dual-issue Cortex-M7 and 32KB instruction-cache. NXP also supports XiP through its M4-based mid-range KineTIs K8x products and the recently launched Arm Cortex-M33-based MCU RT600. ST MCUs and KineTIs K8x combine on-chip flash and support for XiP. The RT600 and RT10xx products are designed without on-chip flash, allowing them to achieve very aggressive price points.

When choosing an external flash device for eXecute-in-Place, the first question to ask is which parts of the firmware will run in XiP mode. Some designers opt for a hybrid approach, keeping performance-critical parts of the program on-chip (ROM, flash, or SRAM) and using external flash in XiP mode to expand the system.

Questions about this:

Execute all or part of the program from on-chip ROM, flash memory, or SRAM? If yes, which of these memory types?

Will all or part of the program be executed directly from external flash in XiP way?

If the answer to (a) is SRAM, then flash memory external to the SoC is required to load the program at boot. Designers can choose from Adesto Phoenix (standard flash), Fusion (battery-optimized flash), or EcoXiP (XiP-optimized octal flash). For cost-sensitive applications where throughput is not critical, Phoenix should be chosen. Fusion is suitable for the best applications with very tight power constraints. Adesto's EcoXiP – AI Inference should only be considered in this case if the customer also requires high performance in XiP mode or requires the throughput of an octal flash device for very fast boot or frequent data read operations This is the case with the engine.

Assuming the answer to (b) is yes (at least some firmware requires XiP), the next question is to figure out how much performance is required. Thanks to the high-speed eight-channel DDR interface, EcoXiP's throughput is approximately 4 times that of standard flash devices. Additionally, the wrap-and-conTInue command further increases the achievable throughput. There are many questions to ask:

Will the SoC contain an instruction cache? (Without instruction cache, XiP performance will be very low; but the advantages of EcoXiP over Quad devices will be more significant)

What frequency will the CPU run at and what is the frequency of the SPI bus?

What level of performance is required to run in XiP?

Does the device require on-site software updates (often called over-the-air (OTA) updates)?

The answer to (c) is crucial. At low frequencies and low demands on XiP performance, executing directly from a standard Quad SPI flash device is very feasible. However, even with a very low miss rate in the instruction cache, executing on a Quad device will provide about 50% better CPU performance than running out of EcoXiP.

Note that if the response to (d) is affirmative, the EcoXiP's read and write capabilities will make OTA updates easier if the SoC is built without additional code storage memory (relying only on XiP). There are other solutions for OTA updates that do not require reads or writes, but they are all SRAM intensive and require complex firmware.

Reviewing Editor: Guo Ting

Leave a Reply

Your email address will not be published.