Microprocessor Report (MPR) Subscribe

Syntiant NDP120 Sharpens Its Hearing

Wake-Word Detector Combines Ultra-Low-Power DLA With HiFi 3 DSP

April 12, 2021

By Mike Demler


Syntiant’s NDP120 is an ultra-low-power edge-AI pro­cessor that can deliver up to 25x greater neural-network throughput than its predecessor, the NDP10x. Although the new chip is mainly for speech recognition, its second-generation deep-learning accelerator (DLA) design can han­dle simple object-detection net­works as well. In always-on mode, the NDP120 typically consumes less than 500 micro­watts. It also adds a Cadence Tensilica HiFi 3 DSP that supports echo cancellation, near- and far-field audio pro­cessing, and other conventional audio-filtering tasks.

By introducing the NDP120, Syntiant aims to build on the popularity of its first product. In the two years since it brought the NDP10x to production, the startup has shipped nearly 20 million units, mostly in mobile phones. It with­held customer names, but it says the NDP10x has also won designs in earbuds and laptop PCs. That success attracted new investors to its $35 million Series C funding round. Micro­soft led the 3Q20 round, joined by Applied Materials and several venture-capital firms, bring­ing Syntiant’s total invest­ment to about $65 million. Previ­ous rounds included strategic investments from Am­azon, Bosch, and Intel. The four-year-old startup now has about 80 em­ployees.

Syntiant is sampling the NDP120 and plans to start production in the third quarter at a list price of $6. An evalua­tion kit combines Infineon MEMS microphones, the new Syntiant processor, and a Raspberry Pi 3 single-board com­puter (SBC). The new chip targets IoT, mobile, sec­urity, smart home, smart speakers, and wearables. UMC builds it in 40nm ULP technology. The processor ships in a tiny 3.1mm x 2.5mm WLBGA or a 5mm QFN package.

A Quadrophonic Sound Processor

Like its predecessor, the NDP120 integrates a Cortex-M0 that manages device operations, as Figure 1 shows. The tiny CPU works with a 1.4MB SRAM. Although the chip em­ploys the same 40nm technology as the NDP10x (see MPR 3/18/19, “Syntiant Knows All the Best Words”), its 100MHz peak clock frequency is five times faster. This peak speed requires a higher op­erating voltage (1.1V) and power, but the company withheld the power rat­ing for this speed. For ultra-low-power always-on opera­tion, we expect the CPU frequency will be 20MHz or slower, as in the NDP10x.

Figure 1. Syntiant NDP120. The second-generation DLA can recognize “Hey Google” while consuming just 280 microwatts. The HiFi 3 DSP performs echo cancellation on audio from four microphones.

The NDP120 can connect to a host processor through its quad-SPI target port, serving as the wake-up device, or it can run standalone as the main processor. The M0 can boot from flash memory attached to the QSPI controller port or from a host download. The new chip can process audio from four microphones con­nected to its pulse-density-modulation (PDM) inputs, twice as many as in the NDP10x. Alternatively, each three-pin PDM interface is reconfigurable as an I2S/TDM inter­face. The inputs handle audio bit streams with up to 48kHz sam­pling.

Most audio-processing, sensor-fusion, and speech-enhancement algorithms run on the NDP120’s DLA, which Syntiant describes as a multipurpose graph processor. The DLA can fuse data from accelerometers, infrared detectors, magnetometers, and other low-frequency sensors. For example, the I2C interface can feed the DLA with pressure-sensor data that enhances detection of breaking glass or gunshots. The sensor-fusion capabilities are useful in AR/VR headsets, medical/wellness devices, and smartphones and wearables, as well as for monitoring acoustic events in machines and security monitors.

The HiFi 3 DSP handles beam forming, near- and far-field processing, and other conventional audio-filter tasks (see MPR 2/6/12, “Tensilica’s HiFi 3 Sounds Good”). The additional microphones provide the inputs for echo and noise cancellation. Syntiant withheld specifications for this DSP, including its speed and power.

Variable Precision Trims Weights

The NDP120’s DLA is a second-generation design that can run convolutional neural networks (CNNs), gated recurrent units (GRUs), and long/short-term-memory (LSTM) models comprising up to 256 layers of 4,096 nodes each. Syntiant was one of the first to develop an inference engine using less than INT8 precision. It says many customers have found INT4 to be sufficient for low-power inference. The new processor enables them to build mixed-precision models employing INT8, INT4, INT2, and binary parameters. Programmers can also choose INT16 activations for higher-precision tasks.

The DLA supports the depth-wise-separable convolution and fully connected layers of MobileNet v2 object-detection networks, as well as common ReLU activations, average-pooling layers, and max-pooling layers. Because the chip lacks a camera interface, however, object detection must transfer images from the host processor through the QSPI port.

The NDP120 can store up to 896KB of neural-network parameters in the DLA’s SRAM. That memory is too small for the 3MB of INT8 parameters in a standard MobileNet v2 model, but the DLA can run this model in INT4 mode while using sparsity to reduce the number of weights. In binary mode, the SRAM can hold more than seven million parameters. INT4 and sparsity reduce infer­ence accuracy, but developers can also shrink the model by reducing the alpha (width) parameter or using a smaller input-image size. Acoustic and keyword-recognition mod­els are much smaller, though, and Syntiant says most easily fit on chip.

A separate SRAM buffer stores up to10 seconds of audio. If the always-on algorithm detects a possible wake event, the chip can enter a higher-power mode, enabling more sophisticated AI and DSP algorithms to process and analyze the stored audio. Only if this analysis validates the wake event does the chip signal the host processor.

Amazon certified the NDP10x for devices integrating its Alexa voice assistant, which the NDP120 supports as well. By combining Android’s audio hardware abstraction layer (HAL) with its NDP software-development kit (SDK), Syntiant enables device manufacturers to easily integrate the Google Assistant. It’s also a member of Qualcomm’s extension program, which helps OEMs integrate NDP processors with Qualcomm Bluetooth audio platforms.

Two-thirds of Syntiant employees develop machine-learning algorithms, enabling it to create custom models based on customer keyword-recognition requirements. The SDK has a TensorFlow interface that directly compiles pretrained networks for use on the NDPxxx chips. Other components include an Alexa wake-word neural network, Linux-driver source code, Python-based interface, and precompiled libraries for Raspberry Pi 3 Model B+. The company also offers a training development kit (TDK) that lets customers target the NDP chips’ hardware for their training process.

Taking on Analog Newcomers

Syntiant rates the NDP120 at 6.4 billion INT8 operations per second (GOPS) at its 100MHz peak speed; the com­pany withheld the power rating at this performance. In the sub-500-microwatt always-on mode, its peak throughput at 30MHz is 1.9 GOPS—a substantial 8x increase from the NDP10x at just 2.5x the power consump­tion. In some applications, power will be less. When running the “Hey Google” wake-word application, for example, the NDP120 chip requires just 280 microwatts.

As Table 1 shows, it matches Ambient’s GPX-10 for the power-efficiency lead (see MPR 11/2/20, “Ambient’s Ana­log Cuts AI Power”). The GPX-10’s analog DLA can deliv­er more than 80x the peak throughput of the NDP120, but at the expense of much greater power. More relevant is always-on mode, where the GPX-10 can run at just 80 microwatts. At that power, the analog chip delivers a mere 0.3 GOPS—sufficient to detect a sound, but insuffi­cient to match the NDP120’s keyword-recognition capabil­ities.

Table 1. Comparison of ultra-low-power DLAs. The GPX-10 delivers the greatest power efficiency of the two, but it lacks the NDP120’s audio-processing capabilities. (Source: vendors, except *The Linley Group estimate).

The GPX-10 omits microphone inputs. Although its Cortex-M4 implements some DSP instructions, that CPU is unable to execute echo/noise suppression or other audio processing as efficiently as the NDP120’s HiFi 3 core. It scales to much greater AI performance, however. With a more powerful CPU and integrated flash memory, the Ambient chip is better suited to serve as the main processor. The company has yet to demonstrate the accuracy of its analog DLA, but we expect it to be similar to that of Syntiant’s INT4 mode.

A Promising Second Act

In a normal always-on wake-word application, the NDP120 consumes about the same power as its predecessor, but the greater peak DLA performance enables additional audio/sensor processing that makes the new chip much more versatile. This DLA performance brings sensor fusion and speech enhancement to the mix, along with the ability to tackle small object-detection tasks, such as recognizing the presence of a person in images loaded from a host processor, all while keeping always-on power under 500 microwatts. Whereas the NDP10x handles only two microphones, the NDP120 handles four, and the new HiFi 3 DSP enables echo/noise cancellation and far-field processing.

The NDP120 is well suited to battery-powered voice-activated devices. The DLA+DSP combination makes it a better choice than the NDP10x for health and fitness wearables and for noise cancellation in headphones. Syntiant supports Amazon’s and Google’s popular voice assistants, as well as Qualcomm’s Bluetooth chipsets, reducing customer time to market. The NDP120 is an enhanced second-generation design that we expect will greatly accelerate the startup’s rapid growth.

Price and Availability

Syntiant is now sampling the NDP120, and it offers a Raspberry Pi development kit with built-in microphones. Volume production is scheduled to start in 3Q21. The company plans to sell the device for $6 in 10,000-unit quantities. More information is at www.syntiant.com/ndp120.

Free Newsletter

Linley Newsletter
Analysis of new developments in microprocessors and other semiconductor products

Events

Linley Spring Processor Conference 2022
Conference Dates: April 20-21, 2022
Hyatt Regency Hotel, Santa Clara, CA
Linley Fall Processor Conference 2021
Held October 20-21, 2021
Proceedings available
Linley Spring Processor Conference 2021
April 19 - 23, 2021
Proceedings Available
More Events »