Microprocessor Report (MPR) Subscribe

Ceva SensPro2 Doubles AI Throughput

Second-Generation Sensor Hubs Target Automotive, IoT, and Wearables

February 8, 2021

By Mike Demler


SensoPro2 adds more base configurations and new design options to Ceva’s sensor-fusion-DSP lineup. Like the first-generation SensPro from 2020, the licensable intellectual property (IP) implements a configurable sensor hub, fusing elements from the company’s BX2 DSPs, XM6 computer-vision engine, and NeuPro deep-learning accelerators (DLAs). Compared with first generation designs manufactured in the same process technology, SensPro2 delivers twice the AI inference throughput and twice the memory bandwidth when running fully connected neural-network layers. It also provides a 20% power reduction at the same performance and up to 6x the performance on DSP algorithms. Production RTL for all models is now available for general licensing.

The new lineup includes seven base models, as Table 1 shows, but customers can configure each one by adding application-specific ISA extensions as well as custom instructions. The first-generation SensPro included just the three largest three models: the SP250, SP500, and SP1000. The SP50 and SP100 are new, integrating 64 and 128 INT8 multiply-accumulate (MAC) units, respectively. Ceva equipped them with the smaller MAC arrays to reduce die area and power consumption in audio-AI applications, such as acoustic sensors, natural-language processing (NLP), and smart speakers.

Table 1. Ceva SensPro2 models. The second-generation lineup adds the smaller SP50 and SP100 models, as well as the SPF2 and SPF4 to target high-precision signal processing of floating-point data. (Source: Ceva)

Each SensPro2 INT8 MAC array can execute one-fourth of the INT16 MAC operations per cycle, or one-sixteenth of the INT32 MACs. The SP500 and SP1000 support binary-neural-network (BNN) layers as well.

The SPF2 and SPF4 are new floating-point DSPs that omit the integer units, integrating 32 and 64 single-precision MAC units, respectively. The floating-point MACs also execute 64 or 128 half-precision operations per cycle. The company designed these models for the high-precision calculations in automotive-powertrain control. The second-generation architecture supports ASIL B fault diagnostics and ASIL D functional safety.

Previously, only the SP500 shipped with floating-point units (FPUs), but SensoPro2 offers that feature as an optional add-on in all of the integer-MAC models. Customers can include 16–64 single-precision (FP32) floating-point MACs, which can execute twice as many half-precision (FP16) MACs per cycle.

A Few Tweaks and Lots of Options

At a high level, the SensPro2 architecture looks almost the same as the first generation (see MPR 4/20/20, “Ceva SensPro Fuses AI and Vector DSP”), but Ceva says design optimizations in some function blocks have increased performance as well as power efficiency. The BX-based scalar DSP core is unchanged (see MPR 2/4/19, “Ceva’s BX Hybrid Boosts DSP Engine”): it handles control tasks, delivering 4.3 CoreMarks per megahertz. Customers can add their own scalar instructions using Ceva-Xtend. As before, the company expects 7nm SensPro2 designs to run at up to 1.6GHz, enabling the largest SP1000 to deliver 3.3 trillion INT8 operations per second (TOPS).

Although Ceva offers seven base configurations of SensPro2, the vector control units (VCUs) have several options, as Figure 1 shows. In the base integer configurations, each VCU can integrate 64–512 INT8 MAC units. The MAC arrays can alternatively compute 16–128 INT16 MAC operations or 4–32 INT32 MAC operations. Designers can optionally include in each VCU 16 or 32 FP32 MAC units that calculate twice as many FP16 MACs. An optional nonlinear ALU can compute Newton-Raphson and Taylor-series approximations for EV powertrain controls, radar, and other types of analog-waveform analysis.

Figure 1. SensPro2 block diagram. The new architecture retains most features from the original, but it allows designers to include applications-specific ISA extensions.

Rather than package all of SensPro2’s capabilities in a bundle, the company allows designers to select the ISA features they require. The new audio ISA extension supports the recurrent neural networks (RNNs) that commonly serve in speech recognition. By employing that capability in the smallest SP50 and SP100 models, performance running the DeepSpeech2 neural network increases by up to 10x compared with the BX2 DSP on its own. SensPro2 is well suited to such speech recognition because the DSP can produce a spectrogram of the audio signals, which the DLA analyzes to calculate the probability of various characters and words.

The IP also supports computer-vision (CV), radar, and simultaneous-location-and-mapping (Slam) ISA extensions, along with custom vector instructions in Vector-Xtend. The optional “light” VCU handles only arithmetic operations that don’t require MACs. The optional Ceva-Connect interface allows customers to integrate their own hardware accelerators.

Ceva attributes SensPro2’s improved neural-network performance to design optimizations that increase MAC-unit utilization, as well as a new “wide load” feature that doubles memory bandwidth, which is critical for fully con­nected layers. To evaluate the DLA improve­ments, the com­pany compared SensPro2 with the first-gen­eration SensPro on a variety of common network mod­els, as Figure 2 shows. On several versions of Google’s Mobile­Net, the new IP de­livers up to twice the perfor­mance of its prede­cessor. It also leads in EfficientNet, DeepLab, and Res­Net-50 throughput. EfficientNet is a suc­cessor to Mobile­Net that the search giant developed in 2019; it scales more effi­ciently and of­fers greater accuracy than the earlier model. DeepLab is an image-segmentation model, and ResNet-50 is a popu­lar object-detection network.

Figure 2. SensPro2 performance improvements. The second-generation design increases MAC utilization as well as memory bandwidth when running fully connected layers, yielding considerably greater DLA throughput on a variety of neural-network models. (Data source: Ceva).

Fusion Boosts Low-Power AI

Although SensPro2 lacks any major architecture changes, it improves both performance and power efficiency compared with its predecessor. Because most edge-AI devices must handle sensor signals, multiple IP vendors have recently introduced licensable cores that combine a DLA with a SIMD/vector CPU or a DSP (see MPR 1/4/21, “IP Adapts to Automotive Roadmap”). But none offers the configurability, features, and performance of Ceva’s product.

For example, designers can employ an Arm Cortex-M55 and Ethos-U55 or Ethos-U65 (see MPR 3/9/20, “Cortex-M55 Supports Tiny-AI Ethos”). The U65 DLA integrates 512 INT8 MAC units, but it lacks INT16 and floating-point capabilities. SensPro2’s BX-based scalar cores deliver 4.3 CoreMarks per megahertz, similar to the M55. The slimmed-down SP50 and SP100 with audio-ISA extensions offer features like those of Cadence’s Tensilica HiFi 5 (see MPR 11/5/18, “Cadence HiFi 5 Is a Smart Listener”), but SensPro2 has a wider SIMD/VLIW with 16 times the memory bandwidth, as well as larger MAC arrays.

Ceva limited the first SensPro to just three configurations, with the promise of greater customization options in a future release. SensoPro2 fulfills that promise, allowing customers to employ smaller models that are more suitable for battery-powered devices. They can fine-tune their designs by selecting just the floating-point or integer units they need, and they can even configure the ISA with application-specific extensions. Automotive-processor developers will appreciate the new ASIL-compliant diagnostic and functional-safety features.

The company supports the SensPro2 IP with a comprehensive software catalog that comprises audio, CV, deep-neural-network, radar, sensor-fusion, Slam, and speech libraries. The SDK includes an LLVM compiler, and it works with the popular TensorFlow Lite Micro framework. The second-generation design is an evolutionary upgrade, but it’s the most complete package for numerous automotive, consumer, and industrial applications.

Price and Availability

Production SensPro2 RTL is available for licensing. Ceva doesn’t disclose pricing. For more information, access www.ceva-dsp.com/product/ceva-senspro.

Free Newsletter

Linley Newsletter
Analysis of new developments in microprocessors and other semiconductor products

Events

Linley Spring Processor Conference 2022
Conference Dates: April 20-21, 2022
Hyatt Regency Hotel, Santa Clara, CA
Linley Fall Processor Conference 2021
Held October 20-21, 2021
Proceedings available
Linley Spring Processor Conference 2021
April 19 - 23, 2021
Proceedings Available
More Events »