![]() |
A Guide to Multicore Processors Fourth Edition Published August 2017 Authors: Jag Bolaria and Tom R. Halfhill Corporate License: $5,995 |
Get the Facts Quickly
"A Guide to Multicore Processors" (4th Edition) provides an in-depth look at 32- and 64-bit high-speed embedded processors with four or more CPU cores. This completely revised report from The Linley Group contains 190+ pages of information on high-end processors from AMD, Baikal, Broadcom, Cavium, Intel, Kalray, Macom (AppliedMicro), Mellanox (Tilera/EZchip), and NXP.
The report focuses on general-purpose RISC and x86 processors that have four or more CPU cores running at 1.0GHz or more, excluding specialized architectures (e.g. DSPs, NPUs). This report covers processors for embedded applications, focusing on networking, communications, storage, and security; it excludes multicore products designed for servers or for mobile devices. (We cover these processors, as well as embedded processors with four or fewer CPU cores, in other reports.)
"A Guide to Multicore Processors" has detailed coverage of AMD's Opteron family; Broadcom's XLP II and Stingray family; Cavium's Octeon TX and Octeon III families; Intel's embedded Xeon and Xeon-D lines; Kalray’s Bostan and Coolidge processor, Macom's Helix family; Mellanox's BlueField family; and NXP's QorIQ LS1 series, LS2 series, and T4 series.
This handy guide, packed with valuable information, brings you up-to-date on the newest developments in this important market and gives you the analysis you need to help choose a supplier or partner. The report also provides market-share and market-size data for the embedded and multicore markets.
"A Guide to Multicore Processors" begins with tutorials on the key technologies implemented by these products, background on the embedded market, and a discussion of the newest technology and market trends. Following these introductory chapters, the report delivers thorough coverage of all announced products in this area. For each major vendor, the report examines the performance, features, and architecture of each product, highlighting strengths and weaknesses in a consistent, easy-to-compare fashion. The report concludes with our own comparisons of these products and conclusions about which will fare best.
What's New in This Edition
Since publishing the previous edition of this report in 2016, we have updated the coverage to include many new announcements, including:
- Broadcom’s Stingray Products
- Cavium's ARMv8-compatible Octeon TX processors
- NXP's newest ARM-based LS1- and LS2-series processors
- Intel's new Xeon (Skylake) processors and Xeon D processors
- Mellanox's new ARM-based BlueField processors
- Final 2016 market size and vendor share
- Embedded-processor forecasts to 2021
Multicore processors offer the best performance and flexibility for applications that are divisible into many small tasks, called threads. In embedded systems, the most common application for these products is networking, because each data packet can usually have its own thread. Packet processing is common in a wide range of networking and communications equipment, including routers, security appliances, storage subsystems, broadband infrastructure, and cellular base stations.
To ease programming, these multicore processors employ general-purpose instruction sets, such as x86, the Power Architecture (PowerPC), MIPS, and ARM. This characteristic distinguishes them from dedicated network processors (NPUs), which use custom instruction sets that are more difficult to program — and from packet-processing ASICs, which aren’t programmable at all. Most multicore embedded processors also include specialized hardware that accelerates packet-processing tasks. Thus, they’re widely favored for complex networking applications that require programmability, customization, and high performance. In addition, these devices are useful for a broad range of embedded systems that require general-purpose programmability.
We estimate the total revenue from general-purpose embedded processors grew 3.3% in 2016, reaching a new high of $4.3 billion. But some market segments, such as wireless communications, declined in 2016. This shrinkage was largely due to China’s slowdown in wireless-base-station deployments and a trend toward using more custom ASICs instead of merchant silicon. Other segments grew: security, Internet gateways, automotive, industrial, and storage.
Intel still leads the embedded-processor market by revenue. Despite their relatively high power consumption and relatively poor feature integration, Intel’s products offer the industry’s best single-thread performance — a big advantage in control-plane processing. The acquisition of Altera, the second-largest FPGA vendor, creates opportunities for future products that integrate embedded processors with programmable logic. In 2015 and 2016, Intel also became the leading supplier of multicore processors for communications — a position held for years by Freescale (now NXP), which suffered from the wireless slowdown.
The wave of industry consolidation that began in 2015 continued in 2016 and 2017. Avago acquired Broadcom and now operates as Broadcom Ltd. Mellanox purchased EZchip, which had previously purchased Tilera. Macom acquired AppliedMicro and immediately began seeking a buyer for the processor part of the business. The biggest merger of 2017, however, is still pending: Qualcomm’s bid for NXP, which only recently absorbed Freescale. If that deal succeeds as expected by the end of the year, only Intel will be a larger semiconductor company. Qualcomm is entering the server market with its new ARMv8-compatible Centriq processor, which it could adapt for high-end embedded applications in 2018 or 2019. The Qualcomm-NXP deal has the most potential to rearrange the embedded-processor market.
Broadcom held its position as the third-largest embedded-processor supplier in 2016. It gained share during the year, largely on the success of its ARM-based StrataGX family. The MIPS-compatible XLP family is fading away, however, and the company sold its next-generation ARM-compatible Vulcan processor to Cavium. Instead, Broadcom is introducing the new BCM58800 family, which has up to eight powerful ARM CPUs. Although it’s still less powerful than the high-end XLP chips, it supports 100Gbps networking and other up-to-date features.
Cavium, the fourth-largest embedded-processor supplier, enjoyed another year of healthy growth in 2016. The MIPS-compatible Octeon chips remain the cash cow, but the company is moving quickly to the ARM architecture. The ARMv8-compatible Octeon TX family addresses the midrange high-performance market, and we expect the Vulcan processor acquired from Broadcom to bolster Cavium’s product line in 2018. Although Cavium’s relatively simple CPUs lag in single-thread performance, their small size enables large multicore designs. Consequently, the company focuses on the data plane, where its many small CPUs and wealth of hardware accelerators are ideal.
AMD entered the ARM-based embedded-processor market in 2015 with its Opteron A1100 family, but it lacks a roadmap for future ARM products. Instead, it’s refocusing on the x86 market. In 2018, we expect to see the first high-performance embedded processors based on the new Zen CPU core, which has rejuvenated the company’s server- and PC-processor business. Zen-based chips should compete strongly with Intel’s midrange embedded Xeon processors.
After absorbing EZchip and Tilera, Mellanox plans to introduce its new ARM-based BlueField processor in 2018. The flagship 16-core chip will compete strongly for smart-NIC designs and storage arrays. It integrates a subsystem that’s virtually a Mellanox ConnectX-5 Ethernet adapter on a chip, and its dual 100 Gigabit Ethernet (100GbE) ports target 200Gbps networking. It also implements new standards such as Non-Volatile Memory Express Over Fabrics (NVMe-oF) for networked SSD storage arrays. BlueField replaces the 100-core ARM-based chip that Tilera was designing before the acquisitions.
Another competitor is French startup Kalray. Using a proprietary architecture that’s programmable with industry-standard tools, Kalray’s 256-core chips target massively parallel processing and real-time applications. A newer, smaller design should be easier to program and adds acceleration for security and machine learning. Overall, the embedded-processor industry remains vibrant, and the transition to ARM is gaining momentum. Nevertheless, we expect Intel and the x86 architecture to rule the market for years to come.
List of Figures |
List of Tables |
About the Authors |
About the Publisher |
Preface |
Executive Summary |
1 Processor Technology |
Processor Basics |
Central Processing Unit (CPU) |
Caches |
MMUs and TLBs |
Bus Bandwidth |
CPU Microarchitecture |
RISC vs. CISC |
Endianness |
Scalar and Superscalar |
Instruction Reordering |
Pipelining and Penalties |
Branch Prediction |
Multicore Processors |
Multithreading |
Main Memory |
DRAM Basics |
DDR Versions |
Memory Subsystems |
I/O and Network Interfaces |
Ethernet Interfaces |
PCI and PCI Express |
RapidIO |
USB |
SAS and SATA |
2 Multicore Applications |
Networking and Communications Equipment |
Control Plane vs. Data Plane |
Control-Plane Processing |
Data-Plane Applications |
Services Cards |
Networked Storage and RAID Controllers |
Security |
Broadband Infrastructure |
Cellular Base Stations |
Common Form Factors |
3 Standard Instruction Sets |
Architecture Comparison |
Technology |
Market Positions |
x86 Instruction Set |
Background |
Initial Instruction Set |
Modern Extensions |
ARM Instruction Set |
Background |
Initial Instruction Set |
Later Extensions |
ARMv8 Architecture |
ARMv8-M |
Scalable Vector Extensions |
ARM Cortex-A57 |
ARM Cortex-A53 |
ARM Cortex-A72 |
MIPS Instruction Set |
Background |
Initial Instruction Set |
Later Extensions |
PowerPC Instruction Set |
Background |
Instruction Set |
4 Multicore Processors |
What Is an Embedded Multicore Processor? |
What Is Not an Embedded Multicore Processor |
Common Characteristics |
Standalone vs. Integrated Processors |
Multicore Processors |
Encryption Engines |
RAID and Other Storage Engines |
Packet-Processing Accelerators |
Acceleration Software |
Data-Plane Development Kit (DPDK) |
OpenDataPlane (ODP) |
Benchmark Software |
CPU Benchmarks |
Security Performance |
5 Technology and Market Trends |
Technology Trends |
Architecture |
Integration Trends |
Software-Defined Functions |
CPU Complexity Tradeoffs |
Memory Access |
Managing Power |
Completeness |
Market Overview |
Market Size by Vendor |
Market Share by Application |
Revenue Market Share by Instruction-Set Architecture |
Market Forecast |
6 Broadcom |
Company Background |
Key Features and Performance |
Internal Architecture |
System Design |
Development Tools |
Product Roadmap |
Conclusions |
7 Cavium |
Company Background |
Key Features and Performance |
Octeon III CN78xx- and CN77xx-Series |
Octeon III CN73xx- and 72xx-Series |
Octeon TX Processors |
Internal Architecture |
Octeon III CPU |
Custom MIPS64 Extensions |
Octeon III Caches |
Octeon III Accelerators |
Octeon TX Architecture |
System Design |
Development Tools |
Product Roadmap |
Conclusions |
8 Intel |
Company Background |
Product Overview |
Key Features and Performance |
Xeon Scalable Processors |
Xeon Platinum and Gold Processors |
Xeon Silver Processors |
Skylake-SP vs. Broadwell-EP |
Xeon D Processors |
Internal Architecture |
Broadwell Microarchitecture |
Skylake Microarchitecture |
Skylake-SP Microarchitecture |
System Design |
Xeon Scalable Processors |
Xeon E5v4 Processors |
Xeon D Processors |
Development Tools |
Product Roadmap |
Conclusions |
9 Kalray |
Company Background |
Key Features and Performance |
Bostan-1 and Bostan-2 (MPPA2 and MPPA2.2) |
Coolidge (MPPA3) |
Internal Architecture |
Andey and Bostan (MPPA-256 v1, v2, and v2.2) |
Coolidge (MPPA v3) |
Development Tools |
Product Roadmap |
Conclusions |
10 Macom (AppliedMicro) |
Company Background |
Key Features and Performance |
Internal Architecture |
System Design |
Product Roadmap |
Conclusions |
11 Mellanox |
Company Background |
Key Features and Performance |
Internal Architecture |
System Design |
Development Tools |
Product Roadmap |
Conclusions |
12 NXP |
Company Background |
Key Features and Performance |
QorIQ LS1-Series Processors |
QorIQ LS2-Series Processors |
QorIQ T4-Series Processors |
Internal Architecture |
ARM CPUs |
Power e6500 CPU |
Acceleration Engines |
Quicc Engine |
QorIQ Packet-Processing Acceleration (DPAA) |
DPAA2 Packet Acceleration |
QorIQ Layerscape Secure Platform |
System Design |
System Interfaces |
Application Examples |
Development Tools |
Product Roadmap |
Conclusions |
13 Other Vendors |
AMD |
Company Background |
Key Features and Performance |
Product Roadmap |
Conclusions |
14 Comparisons |
Sub-30W Processors |
30-50W Processors |
50-100W Processors |
Processors Consuming More Than 100W |
15 Conclusions |
Market and Technology Trends |
Vendor Outlook |
Intel |
NXP |
Broadcom |
Cavium |
Other Multicore-Processor Vendors |
Appendix: Further Reading |
Index |
Figure 1‑1. Basic processor design. |
Figure 1‑2. Simple superscalar processor design. |
Figure 1‑3. CPU pipelining examples. |
Figure 1‑4. Generic multicore processor. |
Figure 1‑5. Interleaved tasks on a multithreaded CPU. |
Figure 1‑6. DRAM evolution. |
Figure 2‑1. The control plane and the data plane. |
Figure 4‑1. Standalone and integrated general-purpose processors. |
Figure 4‑2. Networking-software interfaces. |
Figure 4‑3. Typical curve of IPSec performance versus packet size. |
Figure 5‑1. Worldwide revenue market share of embedded microprocessors, 2015-2016. |
Figure 5‑2. Worldwide revenue market share of the top vendors of embedded processors for communications. |
Figure 5‑3. Worldwide revenue market share of the top vendors of embedded processors for storage. |
Figure 5‑4. Worldwide revenue market share of the top vendors of embedded processors for other applications. |
Figure 5‑5. Worldwide revenue market share of embedded processors by CPU architecture. |
Figure 5‑6. Worldwide revenue of embedded processors by application, 2016–2021. |
Figure 5‑7. Worldwide revenue of embedded processors by communications segment, 2016–2021. |
Figure 6‑1. Block diagram of Broadcom BCM58808H. |
Figure 6‑2. Block diagram of the BCM58808H in a storage appliance. |
Figure 7‑1. Cavium Octeon III and Octeon TX families. |
Figure 7‑2. Block diagram of Cavium Octeon III CN7890. |
Figure 7‑3. Block diagram of Cavium Octeon TX CN8370. |
Figure 7‑4. Block diagram of ParPro card using the Octeon III CN7890. |
Figure 7‑5. Octeon TX CN8370 in a storage array. |
Figure 8‑1. Intel’s nomenclature for Xeon Scalable processors. |
Figure 8‑2. Block diagram of Intel Broadwell embedded Xeon E5-2650v4. |
Figure 8‑3. Block diagram of Skylake microarchitecture. |
Figure 8‑4. Comparison of Intel cache hierarchies. |
Figure 8‑5. Block diagram of Intel’s Purley platform. |
Figure 8‑6. Dual-socket system design based on Intel Xeon E5v4. |
Figure 8‑7. Block diagram of Intel Xeon D. |
Figure 9‑1. Block diagram of the MPPA-256 VLIW CPU. |
Figure 10‑1. Block diagram of Macom Potenza CPU. |
Figure 10‑2. Block diagram of Macom Helix 1 processor. |
Figure 10‑3. Block diagram of a gateway based on Macom Helix 1. |
Figure 11‑1. Block diagram of Mellanox BlueField processor. |
Figure 11‑2. Block diagram of Mellanox ConnectX-5 subsystem. |
Figure 11‑3. Block diagram of BlueField flash-array controller. |
Figure 12‑1. NXP QorIQ T- and LS-series processors. |
Figure 12‑2. Block diagram of NXP QorIQ LS1088A.. |
Figure 12‑3. Second-generation Data Path Acceleration Architecture. |
Figure 12‑4. QorIQ Layerscape Secure Platform. |
Figure 12‑5. NXP VortiQa Network Security Suite. |
Table 2‑1. Some common single-board-computer standards. |
Table 5‑1. Worldwide revenue of the top vendors of embedded processors. |
Table 5‑2. Worldwide revenue of the top vendors of embedded processors for communications. |
Table 5‑3. Worldwide revenue of the top vendors of embedded processors for storage. |
Table 5‑4. Worldwide revenue of the top vendors of embedded processors for other applications. |
Table 5‑5. Worldwide revenue of embedded processors by application, 2016–2021. |
Table 5‑6. Worldwide revenue of embedded processors by communications segment, 2016–2021. |
Table 6‑1. Key parameters for Broadcom BCM588xx processors. |
Table 7‑1. Key parameters for Cavium Octeon III CN78xx processors. |
Table 7‑2. Key parameters for Cavium Octeon III CN77xx processors. |
Table 7‑3. Key parameters for Cavium Octeon III CN73xx and CN72xx. |
Table 7‑4. Selected Cavium Octeon TX embedded processors. |
Table 8‑1. Intel code-names and product numbers. |
Table 8‑2. Intel Xeon embedded multicore processors. |
Table 8‑3. Key parameters for Intel Xeon Platinum and Gold processors. |
Table 8‑4. Key parameters for Intel Xeon Silver embedded processors. |
Table 8‑5. Xeon Scalable (Skylake-SP) versus Xeon E5v4 (Broadwell-EP). |
Table 8‑6. Intel Xeon D embedded processors. |
Table 8‑7. Key parameters for selected Intel Xeon D embedded processors. |
Table 8‑8. Key parameters for Intel C62x (Lewisburg) south-bridge chips. |
Table 8‑9. Key parameters for Intel DH89xx Coleto Creek chips. |
Table 9‑1. Key parameters for Kalray MPPA processors. |
Table 10‑1. Key parameters for Macom Helix 1 processors. |
Table 11‑1. Key parameters for Mellanox BlueField processors. |
Table 12‑1. Key parameters for NXP QorIQ LS1 quad- and octa-core chips. |
Table 12‑2. Key parameters for NXP QorIQ LS2 processors with Cortex-A57. |
Table 12‑3. Key parameters for NXP QorIQ LS2 processors with Cortex-A72. |
Table 12‑4. Key parameters for NXP QorIQ T4 processors. |
Table 12‑5. Performance of SEC 5.3 security engine. |
Table 13‑1. Key parameters for AMD Opteron A1100 SoCs. |
Table 14‑1. Comparison of sub-30W multicore processors. |
Table 14‑2. Comparison of 30–50W multicore processors. |
Table 14‑3. Comparison of 50–100W multicore processors. |
Table 14‑4. Comparison of multicore processors using more than 100W. |