| Order a report

A Guide to Processors for Deep Learning

First Edition

Published September 2017

Authors: Linley Gwennap, Mike Demler, and Loyd Case

Single License: $4,495 (single copy, one user)
Corporate License: $5,995

Pages: 171

Ordering Information



Take a Deep Dive into Deep Learning

Deep learning, also known as artificial intelligence (AI), has seen rapid changes and improvements over the past few years and is now being applied to a wide variety of applications. Typically implemented using neural networks, deep learning powers image recognition, voice processing, language translation, and many other web services in large data centers. It is an essential technology in self-driving cars, providing both object recognition and decision making. It is even starting to move into client devices such as smartphones and embedded (IoT) systems.

Even the fastest CPUs are inadequate for running the highly complex neural networks needed to address these advanced problems. Boosting performance requires more specialized hardware architectures. Graphics chips (GPUs), which are more powerful and more efficient than CPUs for deep learning, have become popular, particularly for the initial training function. Many other hardware approaches have recently emerged, including DSPs, FPGAs, and dedicated ASICs. Although these solutions promise order-of-magnitude improvements, GPU vendors are racing to tune their designs to better support deep learning.

Autonomous vehicles are an important application for deep learning. Vehicles don't implement training but instead focus on the simpler inference tasks. Even so, these vehicles require very powerful processors, but they are more constrained in cost and power than data-center servers, requiring different tradeoffs. Several chip vendors are delivering products specifically for this application; some automakers are developing their own ASICs instead.

Large chip vendors such as Intel and Nvidia are leading the way, investing heavily in new processors for deep learning. Many startups, some well funded, have emerged to develop new, more customized architectures for deep learning. Some focus on training, others on inference. Eschewing these options, leading data-center operators such as Google and Microsoft have developed their own hardware accelerators. In addition, several IP vendors offer specialized cores for deep learning, mainly for inference in autonomous vehicles and other client devices.

We Sort Out the Market and the Products

A Guide to Processors for Deep Learning covers hardware technologies and products. The report provides deep technology analysis and head-to-head product comparisons, as well as analysis of company prospects in this rapidly developing market segment. Which products will win designs, and why? The Linley Group’s unique technology analysis provides a forward-looking view, helping sort through competing claims and products.

The guide begins with a detailed overview of the market. We explain the basics of deep learning, the types of hardware acceleration, and the end markets, including a forecast for both automotive and data-center adoption. The heart of the report provides detailed technical coverage of announced chip products from AMD, Intel (including former Altera, Mobileye, Movidius, and Nervana technologies), NXP, Nvidia (including Tegra and Tesla), Qualcomm, Wave Computing, and Xilinx. It also covers IP cores from AImotive, ARM, Cadence, Ceva, Imagination, Synopsys, and VeriSilicon. A special chapter covers Google’s TPU and TPU2 ASICs. Finally, we bring it all together with technical comparisons in each product category and our analysis and conclusions about this emerging market.

Make Informed Decisions

As the leading vendor of technology analysis for processors, The Linley Group has the expertise to deliver a comprehensive look at the full range of chips designed for a broad range of deep-learning applications. Principal analyst Linley Gwennap and senior analysts Mike Demler and Loyd Case use their experience to deliver the deep technical analysis and strategic information you need to make informed business decisions.

Whether you are looking for the right processor or IP for an automotive application or a data-center accelerator, or seeking to partner with or invest in one of these vendors, this report will cut your research time and save you money. Make the smart decision: order A Guide to Processors for Deep Learning today.

This report is written for:

  • Engineers designing chips or systems for deep learning or autonomous vehicles
  • Marketing and engineering staff at companies that sell related chips who need more information on processors for deep learning or autonomous vehicles
  • Technology professionals who wish an introduction to deep learning, vision processing, or autonomous-driving systems
  • Financial analysts who desire a hype-free analysis of deep-learning processors and of which chip suppliers are most likely to succeed
  • Press and public-relations professionals who need to get up to speed on this emerging technology

This market is developing rapidly — don't be left behind!

What's New in This Edition

The first edition of A Guide to Processors for Deep Learning is completely new. Highlights include:

  • Nvidia’s new Tesla V100 (Volta) accelerator for deep learning
  • Cadence’s first IP core optimized for neural networks, the Vision C5
  • How Intel’s acquisition of Mobileye affects its autonomous-driving roadmap
  • Nvidia’s new Xavier chip for autonomous cars
  • Intel’s newest Xeon Phi processor (code-named Knights Landing)
  • AMD’s new Radeon Instinct accelerators for deep learning
  • Intel’s integration of Nervana technology into its deep-learning roadmap
  • Applying the Xilinx Virtex FPGAs to neural networks
  • Ceva’s next-generation XM6 core for deep learning
  • How Intel’s Stratix 10 FPGA can execute deep-learning algorithms
  • Wave Computing’s DPU, designed specifically for deep learning
  • VeriSilicon’s VIP8000-O, its first IP core for neural networks

Deep-learning technology is being deployed or evaluated in nearly every industry in the world. This report focuses on the hardware that supports this deep-learning revolution. As demand for the technology grows rapidly, we see opportunities for deep-learning accelerators (DLAs) in three general areas: the data center, automobiles, and client devices.

Large cloud-service providers (CSPs) can apply deep learning to improve web search, language translation, email filtering, product recommendations, and voice assistants such as Alexa, Cortana, and Siri. Data-center DLAs are already a billion-dollar market and in just five years will rival server processors in size. By 2022, we expect about half of all new servers (and most cloud servers) to include a DLA.

Deep learning is a critical technology in the development of self-driving cars, solving many vision-processing challenges and enabling the high-level decision making required for autonomous operation. Although the first Level 3 cars are now available, demand will be low for the next few years. Once the technology is proven and reasonably priced, we expect rapid growth, reaching widespread adoption by 2030. In that year, annual automotive DLA revenue could reach $14 billion.

To improve latency and reliability for voice and other cloud services, we see a trend toward implementing neural networks in clients such as PCs, smartphones, wearables, drones, and Internet of Things (IoT) devices. Smartphone makers have rapidly embraced this technology. Apple, Huawei, MediaTek, and Qualcomm are already shipping processors that integrate a DLA, which Apple calls a neural engine. We expect more than 1.7 billion client devices to ship with DLAs in 2022.

In the brief history of deep learning, users have tried several hardware architectures to increase performance. General-purpose CPUs are easy to program, but GPUs and DSPs offer much greater performance. FPGAs offer a flexible approach, and ASICs are the most efficient. Only large companies and well-funded startups can afford ASIC design, however.

Nvidia became the early leader in the data center with its standard GPU products and Cuda software-development tools. Since then, the company has rapidly advanced the performance of its GPUs; its new Volta design adds special “tensor” cores to reach an industry-leading 119Tflop/s for training and inferencing. The Volta card costs thousands of dollars, but many CSPs will find it worth the expense. Nvidia also leads in the drive to develop autonomous vehicles; its next-generation Xavier processor is the industry’s first single-chip solution for Level 3 and above.

Intel offers several DLA architectures. Its Xeon Phi 7200 (Knights Landing) processor can handle neural-network training; the new Knights Mill version improves the chip’s inferencing performance. The company acquired Nervana and plans to offer the startup’s custom DLA architecture in 2018. Through its Altera acquisition, Intel offers a range of FPGAs for companies that wish to design their own DLA architecture. The Movidius acquisition yielded low-power Myriad chips for drones and other clients. Finally, the recent acquisition of Mobileye makes Intel a leader in Level 1 and 2 automotive systems and provides an important component for Level 3 and above. Intel must rationalize these different architectures as it attempts to displace Nvidia from the data center and from self-driving cars.

AMD competes against Nvidia in the GPU market and also in DLAs for the data center. Its new Radeon Instinct products deliver strong FP32 performance for training but can’t touch the performance of Volta’s tensor cores for applications that use FP16 data. In addition, AMD’s software support is much more limited than Nvidia’s. Like Intel, Xilinx offers FPGAs for the data center, but neither FPGA vendor has truly optimized its designs for DNNs. These FPGAs are best suited to large CSPs that want a flexible platform for gate-level DLA design.

Google has developed a series of ASICs, known as TPU and TPU2, that employ a unique architecture that delivers the best performance per watt for inference and training. The CSP has deployed hundreds of thousands of these chips in its data centers but doesn’t sell them to other companies. Several startups are developing similar chips; some plan to sell them to OEMs, whereas others plan to sell complete systems to CSPs.

NXP is the leading supplier of automotive semiconductors, but it’s far behind in autonomous driving. Its S32V234 competes for Level 1 and 2 but is inadequate for autonomous driving. Although the company plans to release a second-generation vision processor in 2018, it needs a huge upgrade to compete against Nvidia and Mobileye (Intel) for autonomous driving. Qualcomm may be able to accelerate that progress once it completes its acquisition of NXP.

Several intellectual-property (IP) vendors have modified their CPU and DSP designs to accelerate deep learning, mainly for inferencing in automobiles and clients. Synopsys offers the EV64, which can deliver up to 4.5TMAC/s per core. This performance is adequate for autonomous driving, and the design can scale down for lower-cost applications. Ceva’s XM6 lacks the necessary performance for Level 3 driving but is well suited to lower-level automotive duties as well as to drones and other client applications.

At 1.1TMAC/s, the Cadence C5 is similar in performance to Ceva’s XM6 and targets the same applications. MediaTek has licensed Cadence IP to serve as the neural engine in its latest smartphone processors, providing a high-volume design win. AImotive is a new IP vendor targeting self-driving cars with its AIware core. Leading IP vendors ARM, Imagination, and VeriSilicon have only recently introduced deep-learning products, hoping to catch up with the earlier suppliers in this market.

List of Figures
List of Tables
About the Authors
About the Publisher
Preface
Executive Summary
1 Deep-Learning Technology
Artificial Neurons
Deep Neural Networks
Neural-Network Training
Neural-Network Inference
Software Frameworks
Caffe
TensorFlow
Torch
Other Popular Frameworks
2 Deep-Learning Applications
Cloud-Based Deep Learning
Advanced Driver-Assistance Systems
Smart Cameras
Financial Technology
Health Care and Medicine
Manufacturing
Robotics
Voice Assistants
3 Deep-Learning Accelerators
Processor Design
VLIW Instruction Issue
Computation Units
CPUs
GPUs
DSPs
Custom Architectures
FPGAs
Performance Benchmarks
4 Market Forecast
Market Overview
Data Center and HPC
Market Size
Market Forecast
Automotive
Market Size
Market Forecast
Autonomous Forecast
Client and IoT
Market Size
Market Forecast
5 Cadence
Company Background
Key Features and Performance
Design Details
Development Tools
Conclusions
6 Ceva
Company Background
Key Features and Performance
Design Details
Development Tools
Product Roadmap
Conclusions
7 Google
Company Background
Key Features and Performance
Internal Architecture
System Design
Development Tools
Product Roadmap
Conclusions
8 Intel
Company Background
Xeon Phi
Key Features and Performance
Internal Architecture
System Design
Development Tools
Product Roadmap
Stratix FPGAs
Key Features and Performance
Design Details
Development Tools
Product Roadmap
Myriad
Key Features and Performance
Design Details
Development Tools
Product Roadmap
Conclusions
9 Mobileye (Intel)
Company Background
Key Features and Performance
Internal Architecture
System Design
Development Tools
Product Roadmap
Conclusions
10 Nvidia Tegra
Company Background
Key Features and Performance
Internal Architecture
System Design
Development Tools
Product Roadmap
Conclusions
11 Nvidia Tesla
Company Background
Key Features and Performance
Internal Architecture
GPU Core
Chip Design
System Design
Development Tools
Product Roadmap
Conclusions
12 Synopsys
Company Background
Key Features and Performance
Design Details
Development Tools
Product Roadmap
Conclusions
13 Wave Computing
Company Background
Key Features and Performance
Internal Architecture
System Design
Development Tools
Product Roadmap
Conclusions
14 Other Chip Vendors
AMD
Company Background
Key Features and Performance
Conclusions
Cerebras
Fujitsu
Graphcore
Groq
IBM
KnuEdge
Mythic
NovuMind
NXP
Company Background
Key Features and Performance
Conclusions
Qualcomm
Company Background
Key Features and Performance
Conclusions
Xilinx
Company Background
Key Features and Performance
Conclusions
15 Other IP Vendors
AImotive
Company Background
Key Features and Performance
Conclusions
ARM
Imagination Technologies
Company Background
Key Features and Performance
Conclusions
VeriSilicon
Company Background
Key Features and Performance
Conclusions
16 Processor Comparisons
How to Read the Tables
Data-Center Accelerators
Floating-Point Accelerators
Integer Accelerators
FPGAs for Deep Learning
Deep-Learning IP
Automotive Processors
CPU Subsystem
Neural-Network Processing
Interfaces
17 Conclusions
Market Summary
Data Center
Automotive
Client
Technology Trends
Neural Networks
Hardware Options
Performance Metrics
Vendor Summary
Data Center
Automotive
IP Vendors
Closing Thoughts
Appendix: Further Reading
Index
Figure 1-1. Neuron connections in a human brain.
Figure 1-2. Model of a neural-network processing node.
Figure 1-3. Transfer function for a sigmoid neuron.
Figure 1-4. Model of a four-layer neural network.
Figure 1-5. Mapping from floating-point (FP32) format to integer (INT8) format.
Figure 2-1. Deep learning for autonomous vehicles.
Figure 2-2. NHTSA and SAE autonomous-driving levels.
Figure 2-3. Waymo autonomous test vehicle.
Figure 2-4. A smart surveillance camera.
Figure 2-5. Processing steps in a computer-vision neural network.
Figure 2-6. Example biopsy images used to diagnose breast cancer.
Figure 2-7. Robotic arms use deep learning.
Figure 2-8. Various smart speakers.
Figure 3-1. VLIW instruction bundle.
Figure 3-2. Example data types.
Figure 4-1. Revenue forecast for deep-learning chips, 2015–2022.
Figure 4-2. Unit forecast for deep-learning chips, 2015–2022.
Figure 4-3. Unit forecast for ADAS vehicles, 2014–2022.
Figure 4-4. Revenue forecast for ADAS processors, 2014–2022.
Figure 4-5. Unit forecast for client deep-learning chips, 2015–2022.
Figure 5-1. Block diagram of Cadence Vision C5 architecture.
Figure 6-1. Block diagram of Ceva XM6 DSP core.
Figure 7-1. Block diagram of Google TPU.
Figure 7-2. Google TPU add-in card.
Figure 7-3. Google TPU2 board.
Figure 8-1. Microarchitecture of Intel Knights Landing CPU.
Figure 8-2. Block diagram of Intel Xeon Phi 7200 processor.
Figure 8-3. Block diagram of Stratix 10 DSP block.
Figure 8-4. Block diagram of Myriad 2 Shave architecture.
Figure 9-1. Block diagram of Mobileye EyeQ4 ADAS processor.
Figure 9-2. Mobileye trifocal system.
Figure 10-1. Block diagram of Nvidia Tegra X1 processor.
Figure 10-2. High-level block diagram of Nvidia Xavier processor.
Figure 10-3. Block diagram of Xavier’s deep-learning accelerator.
Figure 10-4. Circuit board for Nvidia Drive PX2 AutoChauffeur system.
Figure 10-5. Block diagram of Nvidia Drive PX2 AutoChauffeur system.
Figure 11-1. Block diagram of Nvidia Volta core.
Figure 11-2. Block diagram of Nvidia Tesla V100 chip.
Figure 12-1. Block diagram of Synopsys DesignWare EV6x core.
Figure 13-1. Wave DPU processing element.
Figure 13-2. Wave DPU processing cluster.
Figure 13-3. Wave DPU processor array.
Figure 15-1. Block diagram of VeriSilicon VIP8000-O architecture.
Figure 17-1. ImageNet LSVRC trends, 2010–2016.
Figure 17-2. Deep-learning accelerators.
Table 5-1. Key parameters for Cadence Vision IP cores.
Table 6-1. Key parameters for Ceva XM4 and XM6 cores.
Table 7-1. Key parameters for Google TPU accelerator.
Table 8-1. Key parameters for Intel Xeon Phi 7200 processors.
Table 8-2. Key parameters for selected Intel Stratix 10 GX FPGAs.
Table 8-3. Key parameters for Intel Myriad 2 processor.
Table 9-1. Key parameters for Mobileye EyeQ processors.
Table 10-1. Key parameters for Nvidia Tegra automotive processors.
Table 11-1. Key parameters for Nvidia Tesla processors.
Table 12-1. Key parameters for selected Synopsys ARC cores.
Table 13-1. Key parameters for Wave DPU accelerator.
Table 14-1. Key parameters for AMD Radeon Instinct processors.
Table 14-2. Key parameters for NXP vision processors.
Table 14-3. Key parameters for selected Xilinx UltraScale+ FPGAs.
Table 15-1. Key parameters for AImotive AIware IP.
Table 15-2. Key parameters for Imagination PowerVR 2NX IP.
Table 15-3. Key parameters for VeriSilicon VIP8000-O IP.
Table 16-1. Comparison of selected floating-point DLAs for data centers.
Table 16-2. Comparison of integer DLAs for data centers.
Table 16-3. Comparison of FPGAs for deep learning.
Table 16-4. Comparison of selected deep-learning IP cores.
Table 16-5. Comparison of ADAS processors.

Events

Linley Processor Conference 2017
Covers processors and IP cores used in deep learning, embedded, communications, automotive, IoT, and server designs.
October 4 - 5, 2017
Hyatt Regency, Santa Clara, CA
More Events »

Newsletter

Linley Newsletter
Analysis of new developments in microprocessors and other semiconductor products
Subscribe to our Newsletter »