Microprocessor Report (MPR) Subscribe

Syntiant Knows All the Best Words

NDP10x Speech-Recognition Processors Consume Just 200µW

March 18, 2019

By Mike Demler


Syntiant, a neural-network-processor startup, publicly touts analog in-memory computing as its raison d’etre, but its first products implement an entirely digital architecture. Potential customers expressed interest in the company’s low-power digital prototype, so it decided to build a production version. The NDP100 and NDP101 use digital multiply-accumulate (MAC) units rather than the flash-memory-based multipliers that Syntiant plans to use in its analog design. The NDP10x targets always-on near-field keyword and speaker recognition in battery-powered devices, including earbuds, headsets, remote controls, and other voice-activated products. It can also recognize other acoustic events, such as broken glass and gunshots. The company manufactures the chips in a 40nm ULP process. It delivered first samples in July 2018 and expects to ship production volumes in 2Q19.

Syntiant has 35 employees. Its CEO is Kurt Busch, who previously held the same position at Lantronix following a senior-VP stint at Mindspeed. CTO Jeremy Holleman is also a professor at the University of North Carolina, where he specializes in low-power analog neuromorphic ICs. In 2017, the founding team started the company to commercialize Holleman’s work, and in 2Q18, it closed its first $5 million funding round, led by Intel Capital. In 4Q18, Microsoft led the $25 million Series B round. Other strategic investors include Amazon, Applied Materials, Bosch, and Motorola, joining venture-capital investors Embark Ventures, Seraph Group, and Sunstone Management.

Syntiant’s Python-based training-development kit (TDK) allows customers to train the NDP10x to recognize up to 64 words. Because the company developed a TensorFlow plug-in that quantizes weights during training, customers can run models on the neural-network processor without further conversion or recompilation. This plug-in is similar to the quantization feature in Google’s recent TensorFlow Lite 1.0 release. Syntiant also offers a service for English-language keyword training.

During active operation, the NDP10x consumes less than 200 microwatts, but Syntiant expects that replacing the digital MACs with its analog technology will deliver additional power savings. The inference engine handles 4-bit weights, which require half the power to compute, move, and store data compared with the more common INT8 data type. The NDP10x provides 2.0 trillion operations per watt. It processes speech divided into a maximum 200 frames per second, using 560,000 MAC operations per frame, for a total of 240 million operations per second (MOPS). Figure 1 shows the other digital blocks, including a Cortex-M0 CPU, and various digital input/output functions. These blocks remain the same in the flash-based design.

 

Figure 1. NDP101 speech-recognition processor. The chip integrates a neural network comprising fully connected layers that can classify 64 distinct sounds or words. The digital MAC array uses 4-bit weights. It replaces the flash-memory-based analog multipliers Syntiant has proposed for low-power applications, but the company plans to continue developing that technology.

A Sound Architecture

The NDP10x can process audio input from two pulse-density-modulation (PDM) microphone connections or from pulse-code-modulation (PCM) audio streamed through either the SPI port or I2S interface. Two microphones are adequate for voice-activated headsets and other near-field applications, but the NDP10x lacks the resources to employ multi-microphone arrays for beam forming, echo/noise cancellation, and speaker detection—a requirement in smart speakers.

The audio front end takes in 16-bit data, streaming the digital-sound samples through a holding-tank buffer that can store three seconds of speech. When the inference engine recognizes a keyword, the buffer allows forward or backward scrolling through the audio stream to identify multiword phrases. This technique differs from long/short-term memory (LSTM) and recurrent neural networks (RNNs), which directly identify time-dependent audio sequences. In the NDP10x, the feature extractor preprocesses the audio data before running the word-classification algorithm in the neural network. It’s analogous to the hard-wired edge detectors in computer-vision processors, which identify pixels that constitute regions of interest before performing object classification.

The NDP10x feature extractor is a log-mel filter bank (LMFB), a common technique in voice-recognition systems. Researchers developed the mel (short for melody) scale by studying human response to pitch changes. Below approximately 500Hz, the response is typically linear, but above 500Hz, it’s logarithmic. LMFBs model it with a set of triangular digital filters, which provide finer resolution at lower frequencies, as Figure 2 shows.

 

Figure 2. Log-mel filters. The NDP10x speech-recognition processors employ a log-mel filter bank (LFMB) to extract identifiable features from spoken words. Each word generates a unique spectral pattern, which the processor uses to find the closest match among 64 classes. (Source: HaythamFayek.com)

When displayed on a spectrum analyzer, spoken words produce a series of tones at different frequencies. The LFMB outputs comprise identifiable frequency signatures, which serve as the training data for a neural-network engine such as in the NDP10x.The company’s TDK calculates the synaptic weights that enable classification of each frequency signature associated with its 64-word vocabulary. To achieve its low-power target, the NDP10x employs just 4-bit weights and 8-bit activations. It handles a total of 560,000 weights, storing them in on-chip memories close to each neuron’s multiplier. Each network layer is fully connected, comprising a set of 4-bit x 8-bit MACs, but the company withheld architectural details. The inference engine can classify 100 words per second.

The embedded Cortex-M0 CPU runs the Syntiant firmware, as well as user programs residing in the 112KB SRAM. The design supports up to a 20MHz clock frequency, but for such a low-speed/low-power application, we expect the NDP10x runs at less than 10MHz. At that frequency, the CPU alone will consume roughly 40 microwatts.

Both models include a SPI slave input for connecting to a host processor, but the NDP101 adds GPIOs and a SPI-master interface for sensors. The additional inputs allow users to load data directly into the neural-network engine, replacing or combining it with data from the speech-feature detector. The NDP100 ships in a 12-ball WLBGA package, and the NDP101 ships in a 32-pin QFN package.

Analog Is More Than a Bit Harder Than Digital

In digital design, automated synthesis tools account for process, voltage, and temperature (PVT) variability while optimizing power and timing. Analog circuits, however, still require time-consuming manual design, often necessitating multiple respins that delay product delivery. Syntiant isn’t the only company attempting to use nonvolatile-memory cells as analog multipliers, but none of its competitors has reached production. Although analog circuits are inherently more susceptible to noise and variability than digital circuits, performing multiplication directly in the weight memory offers potentially huge power savings. In-memory computing eliminates the power spent accessing weights in external DRAM, but realizing that potential remains elusive.

In its patent applications and presentations, the company described a flash-based technique that’s conceptually similar to the Mythic intelligent processing unit (see MPR 8/27/18, “Mythic Multiplies in a Flash”). Whereas Mythic uses the bit cell’s voltage-variable conductance to represent weights with 8-bit resolution, however, the NDP10x uses 4-bit quantization. Lower precision makes the inference engine less susceptible to PVT effects, but it also limits applications to small training sets, such as the 64 classes in the NDP10x. Although the Mythic IPU handles 10x more weights with 16x greater precision, it consumes 1–5W. Nevertheless, that power is still low for the smart cameras and drones it targets.

The NDP10x targets low-power voice-activated devices such as earbuds, which have tiny batteries and run at much lower clock frequencies than computer-vision systems. Syntiant was therefore able to employ simple low-speed circuits to generate and measure the analog signals into and out of the flash-memory array, whereas Mythic uses 8-bit DACs and ADCs. But according to a patent application, Syntiant plans to implement much less complex pulse-width modulators (PWMs) to generate neural-network inputs, as well as analog integrators with comparators to accumulate the bit-line currents and convert them to digital pulses.

A New Low-Power Champion

Numerous digital speech-recognition processors are on the market, but Syntiant has set a new record for low power consumption. Kopin’s Whisper chip integrates a Cortex-M0 CPU with a proprietary DSP to extract speech in noisy environments (see MPR 3/6/17, “Kopin Audio Chip Hears Whispers”). Like the NDP10x, it’s best suited to hands-free headsets and other near-field applications. But although its DSP can process audio streams from four microphones using beam forming, echo cancellation, FFTs, and adaptive filtering, the 10mW Whisper lacks an inference engine. It must pass the spoken words to a host processor or the cloud to determine their meaning.

Eta Compute previously held the record for low-power speech processing with its Tensai chip, which combines a Cortex-M3 CPU with an NXP CoolFlux DSP running small neural networks (see MPR 10/28/18, “Eta Compute MCU Puts AI in IoT”). Like the NDP10x, Tensai targets keyword detection and sensors in battery-powered IoT devices, and it has a 512KB embedded flash memory to store approximately 450,000 neural-network weights on chip. But at its lowest operating speed, it still consumes 2mW.

Although many researchers believe in-memory computing will boost neural-network power efficiency by orders of magnitude, bringing such research to production has proven difficult. When it comes to analog, silicon has a way of proving simulations to be overly optimistic.

Syntiant is wise to offer a digital product first so it can test and refine its architecture in real-world systems. The NDP10x only solves 64-word problems, but it does so while consuming 90% less power than its nearest competitor. The architecture isn’t particularly advanced compared with other voice-recognition engines, but it’s well suited to its target market: small battery-operated devices. Because customers are already interested in its digital designs, the company may find that additional power savings from analog techniques aren’t worth the effort.

Price and Availability

Syntiant withheld pricing for the NDP100 and NDP101, but we expect them to sell for less than $10 in production quantities. The company began sampling the digital prototype in July 2018, and it plans to deliver production parts in 2Q19. Additional information on the NDP100 and NDP101 is at www.syntiant.com/ndp100 and www.syntiant.com/ndp101, respectively. 

Free Newsletter

Linley Newsletter
Analysis of new developments in microprocessors and other semiconductor products
Subscribe to our Newsletter »

Events

Linley Spring Processor Conference 2021
Coming April 21-22, 2021
Hyatt Regency Hotel, Santa Clara, CA
More Events »