Figure 2: Physical avatar of a three core system with 24 concurrent threads. The top equipment has two cores and the bottom equipment has one core.
We assume that threads in the system communicate by sending data samples through channels. In this design method, it is not important whether threads are executed on a single core or multi-core system, because multi-core only adds scalability to the design. We assume that the calculation requirements of each thread can be established statically and do not depend on data, which is usually the case of uncompressed audio.
We will focus on two parts of the design: buffering between threads (and their impact on performance) and clock recovery. Once these design decisions are made, the internal implementation of each thread follows normal software engineering principles and is as difficult or easy as expected. Buffering and clock recovery are interesting because they both have a qualitative impact on the user experience (promoting stable low latency audio) and are easy to understand in a multithreaded programming environment.
buffer
In digital solutions, data samples are not necessarily transmitted at the time of delivery. This requires buffering of digital audio. For example, consider a USB 2.0 speaker with a sample rate of 48 kHz. The USB layer will transmit bursts of six samples in every 125 µ s window. There is no guarantee that six samples will be transmitted in the 125 µ s window, so a buffer of at least 12 samples is required to ensure that samples can be streamed to speakers in real time.
The design challenge is to build the right amount of cushioning. In analog systems, buffering is not a problem. The signal is transmitted on time. In digital systems designed based on non real time operating systems, programmers usually insist on using large buffers (250 or 1000 samples) to deal with the uncertainty in scheduling strategies. However, large buffers are expensive in terms of memory, increasing latency, and proving that they are large enough to ensure clickless delivery.
Multithreaded design provides a good framework to infer buffers informally and formally and avoid unnecessary large buffers. For example, consider that the above USB speakers add an ambient noise correction system. The system will include the following threads:
The thread that receives USB samples over the network.
A series of 10 or more threads that filter the sample stream, each with a different set of coefficients.
The thread that uses I 2 S to transmit the filtered output samples to the stereo codec.
The thread that reads samples from a codec connected to the microphone sampling ambient noise.
Threads that resample ambient noise to an 8 kHz sampling rate.
The thread to establish the spectral characteristics of environmental noise.
The thread that changes the filter coefficients based on the calculated spectral characteristics.
All threads will run on a multiple of the 48 kHz base cycle. For example, each filtering thread will filter one sample every 48 kHz cycle; The delivery thread will deliver one sample per cycle. Each thread also has a defined window, on which it operates, and a defined method to advance the window. For example, if our filter thread is implemented using biquadratic, it will run on a window containing three samples, one sample ahead of each cycle. The spectrum thread can run on 256 sample windows that push 64 samples for every 64 samples (to execute FFT (Fast Fourier Transform)).
Now it is possible to establish all the parts of the system running in the same cycle and connect them together in the form of synchronous parts. Buffers are not required within these synchronized parts, although a single buffer is required if the thread is to run in the pipeline. A buffer is required between the synchronized parts. In our example, we finally get three parts:
The part that receives samples from USB, filters and transmits them at 48 kHz.
The portion of the ambient noise sampled at 48 kHz and transmitted at 8 kHz.
The part that establishes the spectral characteristics and changes the filter settings at 125 Hz.
The three parts are shown in Figure 3. The first part of receiving samples from the USB buffer needs to buffer 12 stereo samples.
Figure 3: Threads grouped together by frequency.
The transmitted part needs to buffer a stereo sample. Running 10 filter threads as pipes requires 11 buffers. This means that the total delay from the receiver to the codec includes 24 sampling times, i.e. 500 µ s, and an additional sample can be added to cope with the intermediate jitter in the clock recovery algorithm. This part operates at 48 kHz.
The second part of sampling the ambient noise needs to store one sample at the input and six samples in the secondary sampling. Therefore, there are seven sample delays at 48 kHz or 145 µ s.
The third part of establishing spectrum characteristics needs to store 256 samples at a sampling rate of 8 kHz. No additional buffers are required. Therefore, the delay between ambient noise and filter correction is 256 samples at 8 kHz, and the second sampling time is 145 µ s, or just over 32 ms. Please note that these are the minimum buffer sizes of the algorithm we choose to use; If this delay is unacceptable, a different algorithm must be selected.
It is usually easy to design threads to operate on blocks of data rather than individual samples, but this increases the overall latency experienced, memory requirements, and complexity. This should only be considered when there are obvious benefits, such as increased throughput.
Timed Digital Audio
A big difference between digital audio and analog audio is that analog audio is based on this basic sampling rate, while digital audio needs to distribute clock signals to all parts of the system. Although all components can use different sampling rates (for example, some parts of the system may use 48 kHz, while others may use 96 kHz, with a sampling rate converter in the middle), all components should agree on the length of one second, and therefore on the basis of the measured frequency.
An interesting feature of digital audio is that all threads in the system are independent of the base of this clock frequency. Suppose there is a gold standard base frequency. It is not important whether multiple cores in the system use different crystals, as long as they operate on the samples. However, at the edge of the system, the true clock frequency is important, as is the delay in sampling on the way.
In a multithreaded environment, a thread will be set aside to explicitly measure the real clock frequency, implement the clock recovery algorithm, measure the local clock and the global clock, and agree with the master clock on the clock offset.
The clock may be implicitly measured using the underlying bit rate of the interconnect, such as S/PDIF or ADAT. Measuring the number of bits per second on any of the networks will give the measured value of the master clock. The clock can be measured explicitly using protocols designed for this purpose, such as PTP over Ethernet.
In the clock recovery thread, a control loop can be implemented, which estimates the clock frequency and adjusts according to the observed error. In the simplest form, the error is used as an indicator for adjusting the frequency, but the filter can be used to reduce jitter. This software thread implements the functions traditionally executed by PLL but in software, so it can adapt to the environment cheaply.
conclusion
The multithreading development method enables the digital audio system to be developed using the divide and conquer method. One of the problems is divided into a group of concurrent tasks, and each task is executed in a separate thread on the multithreaded kernel.
Like many real-time systems, digital audio is suitable for the multithreading design method, because the digital audio system obviously consists of a group of data processing tasks, and these tasks also need to be executed at the same time
Disclaimer: This article is transferred from other platforms and does not represent the views and positions of this site. If there is any infringement or objection, please contact us to delete it. thank you!
中恒科技ChipHomeTek
|