If your MCU application needs to process digital audio, consider a multithreaded approach. Using a multithreaded design approach enables designers to reuse parts of their designs in a straightforward manner.

Multi-core and multi-threading are effective ways to design real-time systems. Using these techniques, systems are designed as collections of many tasks that operate independently and communicate with each other when needed. Breaking system design from large monolithic blocks of code into more manageable tasks can greatly simplify system design and speed up product development. Therefore, the real-time properties of the entire system are easier to understand. Designers only need to worry about the fidelity of each task implementation, asking questions such as "Is the network protocol implemented correctly?"

In this article, we will discuss how to use multi-threaded or multi-core design methodologies to design real-time systems that operate on data streams, such as digital audio systems. We illustrate the design approach using several digital audio systems, including asynchronous USB Audio 2, Ethernet AVB, and a digital dock for MP3 players. We will briefly discuss the concepts of digital audio, multi-core and multi-threading before showing how to effectively use multi-core and multi-threading to design the required buffering and clocking schemes.

digital audio

Digital audio has replaced analog audio in many consumer markets for two reasons. First, most audio sources are digital. Whether delivered in a lossy compressed format (MP3) or in an uncompressed format (CD), digital standards have replaced traditional analog standards such as tape and tape. Second, digital audio is easier to handle than analog audio. Data can be transferred over existing standards such as IP or USB without loss, and the hardware design does not require any "magic" to reduce the noise floor. As far as the digital path is concerned, the noise floor is constant and unaffected by TDMA noise that can be caused by mobile phones.

Digital audio systems operate on streams of samples. Each sample represents the amplitude of one or more audio channels at a point in time, with the time between samples controlled by the sample rate. The CD standard has two channels (left and right) and uses a sampling rate of 44.1 kHz. Common audio standards use 2, 6 (5.1), and 8 (7.1) channels, and sample rates of 44.1 kHz, 48 kHz, or multiples. We use 48 kHz as a running example, but this is by no means the only criterion.

Multicore and Multithreading

In a multithreaded design approach, a system is represented as a collection of concurrent tasks. Using concurrent tasks, rather than a single single program, has several advantages:

Multitasking is a great way to support separation of concerns, one of the most important aspects of software engineering. Separation of concerns means that different tasks of the design can be individually designed, implemented, tested, and verified. Once interactions between tasks are specified, teams or individuals can individually complete their own tasks.

Concurrent tasks provide a simple framework for specifying what the system should do. For example, a digital audio system will play audio samples received over a network interface. In other words, the system should perform two tasks at the same time: receive data from the network interface and play samples on its audio interface. Representing these two tasks as a single sequential task is confusing.

A system represented as a collection of concurrent tasks can be implemented as a collection of threads in one or more multithreaded cores (see Figure 1). We assume that threads are scheduled at the instruction level, as is the case on XMOS XCore processors, since this enables concurrent tasks to run in real time. Note that this is not the same as multithreading on Linux, eg threads are scheduled on a single processor with context switching. This may make these threads appear concurrent to humans, but not to a set of real-time devices.

Concurrent tasks are logically designed to communicate via message passing, when two tasks are implemented by two threads, they communicate by sending data and control channels. Inside the kernel, channel communication is performed by the kernel itself, and through switches when threads are on different cores (see Figure 2).

Multithreaded designs have been used by embedded system designers for decades. To implement embedded systems, system designers used to use a large number of microcontrollers. Inside a music player, for example, you might find three microcontrollers that control the flash, DAC, and MP3 decoder chips.

Figure 1: Threads, channels, cores, switches, and links. Concurrent threads communicate through channels within a core, between cores on a chip, or between cores on different chips.

We believe that modern multithreaded environments can replace this design strategy. A single multithreaded chip can replace multiple MCUs and provide an integrated communication model between tasks. Rather than implementing custom communication between tasks on separate MCUs, the system is implemented as a set of threads communicating over channels.

Using a multithreaded design approach enables designers to reuse parts of their designs in a straightforward manner. In traditional software engineering, functions and modules combine to perform complex tasks. However, this approach is not necessarily suitable for a real-time environment, since executing two functions in sequence may break the real-time requirements of the function or module.

In an ideal multithreaded environment, the composition of real-time tasks is trivial, as it is only a case of adding a thread (or core) for each new real-time task. In practice, the designer places a limit on the number of cores (for economic reasons, for example), and must therefore decide which tasks will constitute concurrent threads, and which tasks will be integrated into the functionality of a single thread as a collection.

Multithreaded digital audio

A digital audio system is easily split into multiple threads, including a network stack thread, a clock recovery thread, an audio transmission thread, and optional threads for DSP, device upgrades, and driver authentication. A network protocol stack can be as complex as an Ethernet/IP stack with multiple concurrent tasks, or as simple as an S/PDIF receiver.

Figure 2: Physical incarnation of a three-core system with 24 concurrent threads. The top device has two cores and the bottom device has one core.

We assume that threads in the system communicate by sending data samples through channels. In this design approach, it doesn't matter whether the threads execute on a single-core or multi-core system, since multiple cores just add scalability to the design. We assume that the computational requirements of each thread can be established statically and are not data dependent, which is often the case for uncompressed audio.

We'll focus on two parts of the design: buffering between threads (and their impact on performance) and clock recovery. Once these design decisions have been made, implementing the internals of each thread follows normal software engineering principles and is as hard or as easy as one might expect. Buffering and clock recovery are interesting because they both have a qualitative impact on the user experience (promoting stable low-latency audio) and are easy to understand in a multithreaded programming environment.


In digital solutions, data samples are not necessarily transmitted upon delivery. This requires buffering digital audio. For example, consider a USB 2.0 speaker with a sample rate of 48 kHz. The USB layer will transmit bursts of six samples in every 125 µs window. There is no guarantee that six samples will be delivered in a 125 µs window, so a buffer of at least 12 samples is required to guarantee that the samples can be streamed to the speakers in real time.

The design challenge was to create the right amount of cushioning. In an analog system, buffering is not an issue. Signals are delivered on time. In digital systems based on non-RTOS designs, programmers typically stick to fairly large buffers (250 or 1,000 samples) to cope with uncertainty in scheduling policies. However, large buffers are expensive in terms of memory, adding latency, and proving they are large enough to guarantee click-free delivery.

Multithreaded designs provide a good framework to reason about buffering informally and formally and avoid unnecessarily large buffers. For example, consider the aforementioned USB speaker with the addition of an ambient noise correction system. The system will include the following threads:

Thread that receives USB samples over the network.

Filter a series of 10 or more threads of a sample stream, each with a different set of coefficients.

A thread that uses I2S to pipe the filtered output samples to the stereo codec.

A thread that reads samples from a codec connected to a microphone that samples ambient noise.

Thread that subsamples ambient noise to an 8 kHz sample rate.

A thread that builds the spectral characteristics of ambient noise.

Threads that change filter coefficients based on computed spectral properties.

All threads will run on some multiple of the 48 kHz base cycle. For example, each filter thread will filter one sample per 48 kHz cycle; the delivery thread will deliver one sample per cycle. Each thread also has a defined window on which it operates, and a defined method by which to advance this window. For example, if our filter thread were implemented using a biquad, it would run on a window of three samples, one sample ahead per cycle. The spectral thread can run on a 256-sample window (to perform an FFT (Fest Fourier Transform)) advancing by 64 samples every 64 samples.

It is now possible to build all the parts of the system that run on the same cycle and connect them together as synchronous parts. Buffers are not required within these synchronized sections, although a single buffer is required if the thread is to run in a pipeline. Buffers are required between the various synchronization sections. In our example, we end up with three parts:

The part that receives samples from USB, filters, and transmits at 48 kHz.

Part of ambient noise sampled at 48 kHz and transmitted at 8 kHz.

Section that establishes spectral characteristics and changes filter settings at 125 Hz.

These three parts are shown in Figure 3. The first part of receiving samples from the USB buffer requires buffering 12 stereo samples.

Figure 3: Threads grouped together according to frequency.

The passed part needs to buffer a stereo sample. Running 10 filter threads as a pipeline requires 11 buffers. This means that the total delay from receiver to codec includes 24 sample times, or 500 µs, and an extra sample can be added to account for mid-term jitter in the clock recovery algorithm. This part runs at 48 kHz.

The second part of sampling the ambient noise requires storing one sample at the input and six samples in the subsampling. So there is a 7 sample delay at 48 kHz or 145 µs.

The third part of establishing the spectral characteristics requires storing 256 samples at a sampling rate of 8 kHz. No additional buffers are required. Therefore, the delay between ambient noise and filter correction is 256 samples at 8 kHz, with a subsampling time of 145 µs, or just over 32 ms. Note that these are the minimum buffer sizes for the algorithm we choose to use; if this latency is not acceptable, a different algorithm must be chosen.

It is often easy to design threads to operate on blocks of data rather than individual samples, but this increases the overall latency experienced, increases memory requirements, and increases complexity. This should only be considered if there is a clear benefit, such as increased throughput.

timing digital audio

A big difference between digital and analog audio is that analog audio is based on this base sample rate, whereas digital audio requires a clock signal to be distributed to all parts of the system. Although components can all use different sample rates (for example, some parts of the system may use 48 kHz, while others may use 96 kHz, with a sample rate converter in between), all components should agree on a one-second length agreement, and therefore on the basis of measurement frequency.

An interesting feature of digital audio is that all threads within the system are independent of the base of this clock frequency, assuming there is a gold standard base frequency. It doesn't matter if the multiple cores in the system use different crystals, as long as they operate on the sample. However, at the edge of the system, where the true clock frequency matters, so does the delay that sampling creates en route.

In a multithreaded environment, one thread would be set aside to explicitly measure the real clock frequency, implement the clock recovery algorithm, measure the local clock versus the global clock, and agree on the clock skew with the master clock.

The clock can be measured implicitly using the underlying bit rate of the interconnect, such as S/PDIF or ADAT. Measuring the bits per second on any of these networks will give a measure of the master clock. Clocks can be measured explicitly by using protocols designed for this purpose, such as PTP over Ethernet.

In the clock recovery thread, a control loop can be implemented that estimates the clock frequency and makes adjustments based on the observed error. In its simplest form, error is used as an indicator to adjust frequency, but filters can be used to reduce jitter. This software thread implements a function traditionally performed by a PLL but in software, so it can be inexpensively adapted to the environment.

in conclusion

Multithreaded development methods enable digital audio systems to be developed using a divide-and-conquer approach, where a problem is divided into a set of concurrent tasks, each executed in a separate thread on a multithreaded core.

Like many real-time systems, digital audio lends itself to a multi-threaded design approach because a digital audio system obviously consists of a set of tasks that process data and also require these tasks to execute concurrently.

Leave a Reply

Your email address will not be published.