Mobile Chip Report (MCR) Subscribe

Ceva X2 Controls Multiple PHYs

DSP Core Handles LTE-Advanced and 802.15.4 Protocols

August 1, 2016

By Mike Demler

Ceva’s new X2 DSP core performs physical-layer (PHY) control tasks in complex wireless modems. Introduced at the recent Linley Mobile & Wearables Conference, the design targets LTE-Advanced carrier aggregation, which requires digital basebands to simultaneously control PHYs for multiple communications channels and radio access technologies (RATs), such as a combination of small-cell and macrocell connections. The 3GPP’s LTE Release 13 and later require modem designers to support the new protocols, but they must also retain 3G and even 2G functions for fallback in legacy networks or for voice when voice-over-LTE (VoLTE) is unavailable.

The emergence of 5G systems will increase PHY-controller complexity by combining licensed and unlicensed spectrum in new higher-frequency bands. The X2’s software configurability enables it to integrate multi-RAT support in a single modem, saving area and power compared with using separate digital baseband circuits for each protocol. The scalar DSP controller can also handle PHY and media-access-controller (MAC) operations for 802.15.4 Thread and ZigBee, as well as 802.15.4g smart-utility networks and power-line communications (PLC).

The X2 is the second member of the Ceva-X family, following the introduction of the X4 earlier this year (see MPR 3/7/16, “Ceva’s New Gen-X DSPs Target 5G”). Whereas the X4 performs both PHY control and data-path processing, the X2 focuses on the former and eliminates the hardware its predecessor uses for the latter—for example, per-channel measurement and decoding. Designers can instead allocate those data-path functions to one of the company’s XC-series communications cores, such as the XC4500 and XC5 (see MPR 11/4/13, “Ceva Targets Wireless Infrastructure,” and MPR 12/14/15, “Ceva Optimizes DSPs for IoT”). Many of Ceva’s LTE customers use those vector DSPs to handle the more demanding signal-processing tasks in the receive-side subsystem, as Figure 1 shows.

Figure 1. Simplified block diagram of Ceva X2-based modem. This multimode LTE Category 12 reference architecture uses the X2 DSP to control up to four PHYs. The DigRF interface connects to the RF subsystem.

Ceva estimates the X2 achieves up to a 1.5GHz clock frequency when manufactured in a 16nm FinFET process, delivering 4.3 EEMBC CoreMarks per megahertz using its new compiler release. The company supports the core with an RTOS, and it offers communications libraries for LTE, 3G (WCDMA and TD-SCDMA), and 2G (GSM/EDGE) cellular protocols. The X2 is available for licensing now.

Trimming the Controls

According to Ceva’s estimates, trimming the X2’s DSP functions to focus solely on PHY control enables designers to reduce power consumption by 10–25% relative to performing those functions with the X4, and it shrinks the die area by 30–65%. One way the X2 makes this improvement is by cutting the maximum number of scalar processor units (SPUs) from four in the X4 to just two, as Figure 2 shows. The SPU design is the same as in the X4. Each unit can execute two 16x16-bit multiply-accumulate operations per cycle or two other 16-bit operations, such as add/subtract and multiply. Alternatively, each SPU can perform a single 32x32-bit multiply-accumulate operation.

Figure 2. Block diagram of Ceva X2 PHY-controller DSP. The 10-stage pipeline runs at up to 1.5GHz in a 16nm process, and each scalar unit executes two 16x16-bit multiply-accumulate operations per cycle. Blocks with dashed outlines are optional.

The X2 instruction pipeline uses the same 10-stage design as in the X4, but it narrows the SIMD width from 128 to 64 bits. Each SPU has its own 32-bit SIMD unit controlled by a VLIW instruction way. The 128-bit VLIW packet can contain five instructions (down from seven ways): one to each load/store unit, one to each SPU, and one to the program controller (PCU). The ISA has specialized modem DSP commands such as FFTs. Supported integer data types are 8, 16, 32, and 64 bits. A single-precision IEEE 754–compliant scalar floating-point unit is optional, and designers can choose to include or exclude it in each SPU independently.

The company offers a number of other options for the X-series DSPs. The branch target buffer (BTB) supports dynamic branch prediction and is configurable as a two- or four-way table with 64, 128, or 256 entries. Both X-series cores allow designers to integrate dedicated hardware accelerator ports, which require direct internal-memory access to minimize latency. The optional Ceva-Connect module offloads the DSP to handle that function. It integrates queue and buffer managers that control data flow to and from the accelerators as well as external DRAM. The module handles each queue independently to ensure quality of service (QoS). By integrating these control functions into the DSP, designers can eliminate a separate real-time controller core, such as an ARM Cortex-R.

The X2 provides separate 128-bit memory interfaces for the load and store units. Like the X4, it optionally includes L1 instruction and data caches, and an optional Amba Ace port enables the caches to be coherent with other Ceva DSPs or with the L2 in an ARM CPU. The cache sizes remain the same. The data cache can store up to 64KB, and it’s configurable for two- or four-way set associativity. The cache supports writeback and write-through policies; it’s also nonblocking for reads and writes on a cache miss. The instruction cache can store up to 128KB and is also configurable as two- or four-way set associative.

Designers can also include tightly coupled data and instruction memories (D-TCMs and I-TCMs). The instruction TCM can be up to 256KB, the same as in the X4, but the data TCM has half the capacity at 512KB. A tiny L0 cache stores as many as four instruction packets. The X2 also halves the number of registers compared with the X4, providing up to thirty-two 32-bit storage locations.

Less Is More

As Table 1 shows, the Ceva-X2 and Ceva-X4 DSPs have many features in common. As the name implies, much of the X2’s area and power savings come from using half-portions of the X4’s main architectural blocks, such as a narrower SIMD with half the SPUs. The result is half the MACs per cycle for the X2 relative to the X4. Those changes allowed Ceva to cut the data memories and registers in half and equip the core with half the AXI ports of its larger sibling.

Table 1. Ceva DSP comparison. The new X2 employs half the SPUs compared to the X4, and further reduces area and power by eliminating hardware used for data-path processing. *Area calculation assumes a 16nm process and includes AXI as well as the I/D-cache controllers and BTB, but no cache RAM, TCM, or other optional features. (Source: Ceva)

By keeping the X2’s core architecture unchanged, however, Ceva gives its customers greater design flexibility and scalability. Designers can use the company’s cycle-accurate-simulation models and configuration tools to test their modem algorithms before committing to a chip design. By comparing different DSP cores and configurations, they can make tradeoffs among area, performance, and power for each set of requirements.

The challenge, however, is to determine how many degrees of freedom are too many. Most of the X-series features are optional, including the BTBs, caches, FPU, TCMs, and Ceva-Connect module. Designers can perform functions in software or include one of the many hardware accelerators that Ceva has developed. They can also mix and match cores, combining an X2 or X4 (or multiple instances of those cores) with a more powerful SIMD/VLIW engine, such as the XC-4500, or with the XC5/XC8 for narrowband LTE in an IoT device. The company will continue adding to its modem reference-design catalog to keep up with the growing number of LTE protocols.

Boasting a list of publicly announced LTE customers that includes Intel, Leadcore, Samsung, Spreadtrum, and ZTE, Ceva has become the dominant licensable DSP-IP provider for high-performance cellular modems. In addition, several licensees employ its IP in LTE base stations. DSP-IP vendor Cadence has shifted its focus to lower-data-rate IoT devices, leaving only in-house designs (such as Qualcomm’s) as competition. Ceva offers the industry’s most comprehensive catalog of DSP cores for nearly every wireless standard, and the new X-series products position it to deliver on the next generation of 5G modems.

Price and Availability

Ceva does not disclose pricing for its licensable DSP cores. The X2 is currently available for licensing. For more information, access


Linley Autonomous Hardware Conference 2017
Focusing on hardware design for autonomous vehicles and deep learning
April 6, 2017
Hyatt Regency Hotel, Santa Clara, CA
Register Now!
More Events »


Linley Newsletter
Analysis of new developments in microprocessors and other semiconductor products
Subscribe to our Newsletter »