| Order a report

A Guide to Processors for Deep Learning

Second Edition

Published February 2019

Authors: Linley Gwennap and Mike Demler

Single License: $4,495 (single copy, one user)
Corporate License: $5,995

Pages: 209

Ordering Information



Take a Deep Dive into Deep Learning

Deep learning, also known as artificial intelligence (AI), has seen rapid changes and improvements over the past few years and is now being applied to a wide variety of applications. Typically implemented using neural networks, deep learning powers image recognition, voice processing, language translation, and many other web services in large data centers. It is an essential technology in self-driving cars, providing both object recognition and decision making. It is even moving into client devices such as smartphones and embedded (IoT) systems.

Even the fastest CPUs are inadequate to efficiently execute the highly complex neural networks needed to address these advanced problems. Boosting performance requires more specialized hardware architectures. Graphics chips (GPUs) have become popular, particularly for the initial training function. Many other hardware approaches have recently emerged, including DSPs, FPGAs, and dedicated ASICs. Although these solutions promise order-of-magnitude improvements, GPU vendors are tuning their designs to better support deep learning.

Autonomous vehicles are an important application for deep learning. Vehicles don't implement training but instead focus on the simpler inference tasks. Even so, these vehicles require very powerful processors, but they are more constrained in cost and power than data-center servers, requiring different tradeoffs. Several chip vendors are delivering products specifically for this application; some automakers are developing their own ASICs instead.

Large chip vendors such as Intel and Nvidia currently generate the most revenue from deep-learning processors. But many startups, some well funded, have emerged to develop new, more customized architectures for deep learning; Graphcore, Habana, and Wave are among the first to deliver products. Eschewing these options, leading data-center operators such as Amazon, Google and Microsoft have developed their own hardware accelerators. In addition, several IP vendors offer specialized cores for deep learning, mainly for inference in autonomous vehicles and other client devices.

We Sort Out the Market and the Products

A Guide to Processors for Deep Learning covers hardware technologies and products. The report provides deep technology analysis and head-to-head product comparisons, as well as analysis of company prospects in this rapidly developing market segment. Which products will win designs, and why? The Linley Group’s unique technology analysis provides a forward-looking view, helping sort through competing claims and products.

The guide begins with a detailed overview of the market. We explain the basics of deep learning, the types of hardware acceleration, and the end markets, including a forecast for both automotive and data-center adoption. The heart of the report provides detailed technical coverage of announced chip products from AMD, Eta Compute, Graphcore, GreenWaves, Gyrfalcon, Intel (including former Altera, Mobileye, Movidius, and Nervana technologies), Mythic, NXP, Nvidia (including Tegra and Tesla), Qualcomm, Wave Computing, and Xilinx. It also covers IP cores from AImotive, Arm, Cadence, Cambricon, Ceva, Imagination, Synopsys, Videantis, and the open-source NVDLA. Other chapters cover Google’s TPU family of ASICs and Microsoft’s Brainwave. Finally, we bring it all together with technical comparisons in each product category and our analysis and conclusions about this emerging market.

Make Informed Decisions

As the leading vendor of technology analysis for processors, The Linley Group has the expertise to deliver a comprehensive look at the full range of chips designed for a broad range of deep-learning applications. Principal analyst Linley Gwennap and senior analyst Mike Demler use their experience to deliver the deep technical analysis and strategic information you need to make informed business decisions.

Whether you are looking for the right processor or IP for an automotive application or a data-center accelerator, or seeking to partner with or invest in one of these vendors, this report will cut your research time and save you money. Make the smart decision: order A Guide to Processors for Deep Learning today.

This report is written for:

  • Engineers designing chips or systems for deep learning or autonomous vehicles
  • Marketing and engineering staff at companies that sell related chips who need more information on processors for deep learning or autonomous vehicles
  • Technology professionals who wish an introduction to deep learning, vision processing, or autonomous-driving systems
  • Financial analysts who desire a hype-free analysis of deep-learning processors and of which chip suppliers are most likely to succeed
  • Press and public-relations professionals who need to get up to speed on this emerging technology

This market is developing rapidly — don't be left behind!

What's New in This Edition

The second edition of A Guide to Processors for Deep Learning covers dozens of new products and technologies announced in the past year, including:

  • Nvidia’s new Tesla T4 (Turing) accelerator for inference
  • Arm’s first machine-learning acceleration IP
  • Intel’s Myriad X chip, with a new neural engine, for embedded systems
  • Ceva’s NeuPro, a customized IP core for deep learning
  • Intel’s VNNX instruction-set extensions for accelerating AI inference
  • AMD’s MI60 Radeon Instinct accelerator based on the 7nm Vega chip
  • Imagination’s new PowerVR 3NX deep-learning accelerators
  • A detailed analysis of Microsoft’s FPGA-based Brainwave accelerator
  • Graphcore’s first product, the GC2 accelerator card
  • Eta Compute’s spiking neural-network accelerator
  • Details on Mythic’s analog-compute technology for neural networks
  • Cadence’s Vision Q6 and DNA 100 AI cores
  • AImotive’s third-generation AI core for autonomous vehicles
  • Qualcomm’s Hexagon 690, which adds a neural engine for a 3x gain
  • The open-source NVDLA, based on Nvidia’s proven Xavier design
  • Products from Cambricon, China’s leading AI-acceleration startup
  • Videantis’s new v-MP6000UDX computer-vision IP
  • Details on Google’s TPUv2 and TPUv3
  • Other new vendors such as BrainChip, Cornami, GreenWaves, Habana, NovuMind, and SambaNova
  • The new AI-Benchmark and MLPerf tests

Deep-learning technology is being deployed or evaluated in nearly every industry in the world. This report focuses on the hardware that supports this AI revolution. As demand for the technology grows rapidly, we see opportunities for deep-learning accelerators (DLAs) in three general areas: the data center, automobiles, and client devices.

Large cloud-service providers (CSPs) can apply deep learning to improve web search, language translation, email filtering, product recommendations, and voice assistants such as Alexa, Cortana, and Siri. Data-center DLAs exceeded $3 billion in 2018 revenue and in five years will approach $10 billion. By 2023, we expect nearly half of all new servers (and most cloud servers) to include a DLA.

Deep learning is a critical technology in the development of self-driving cars, overcoming many vision-processing challenges and enabling the high-level decision making required for autonomous operation. Although the first Level 3 cars are now available, demand will be low for the next few years. Once the technology is proven and reasonably priced, we expect rapid growth, reaching widespread adoption by 2030. In that year, annual automotive DLA revenue could reach $14 billion.

To improve latency and reliability for voice and other cloud services, clients such as PCs, smartphones, wearables, drones, and Internet of Things (IoT) devices are starting to implement neural networks. Every premium smartphone today ships with a DLA as part of its main processor, and this technology is already trickling down into midrange phones. We expect 1.9 billion client devices to ship with DLAs in 2022.

In the brief history of deep learning, users have tried several hardware architectures to increase performance. General-purpose CPUs are easy to program, while GPUs and DSPs offer greater performance. Although CPU and GPU vendors have added AI-specific features to their designs, some new architectures offer superior performance and efficiency.

Nvidia became the early leader in the data center with standard GPU products, but its newer Volta and Turing designs add "tensor cores" that greatly improve performance for neural-network training and inference. These cards cost thousands of dollars, but many CSPs find them worth the expense. Nvidia also leads the drive to develop autonomous vehicles; its Xavier processor is the industry's first single-chip solution for Level 3 autonomy, and its Drive AGX cards deliver even greater capabilities.

Intel offers several DLA architectures. Its standard Xeon CPUs are often used for inference, and the new Cascade Lake models triple inference throughput. The company acquired AI startup Nervana in 2016 but has struggled to bring its DLA technology to market. Intel also offers a range of FPGAs for customers that wish to design their own DLA architecture. The Movidius acquisition yielded low-power Myriad chips for drones and other camera-based devices. Finally, the acquisition of Mobileye makes Intel a leader in Level 1 and 2 automotive systems and provides an important component for Level 3 and above.

Several other companies offer data-center DLAs. AMD recently upgraded its Radeon Instinct accelerators, but their performance lags far behind Nvidia's. Xilinx introduced FPGA-based Alveo cards, which can outperform Nvidia for applications that require small batch sizes. Habana is the first startup to deliver a production-ready DLA with significantly better inference throughput than Nvidia's. Startup Graphcore delivered its first DLA, which shows promise for neural-network training. Several other startups aim to deliver data-center DLAs in 2019 and beyond. These vendors must also compete against in-house DLA projects at most of the top CSPs, including Google's TPU and Microsoft's Brainwave, and at Huawei, which is developing a DLA that it plans to sell in servers.

Similarly, several automakers are developing their own chips for future autonomous vehicles. These ASICs often use licensed DLA cores from vendors such as AImotive and Videantis. Ceva and Synopsys also license DLA cores for automotive applications, more typically for less capable Level 1–3 designs. These cores can serve in other types of systems as well.

Counting design wins in Huawei and MediaTek smartphone processors, Cadence is the leading supplier of DLA cores by volume. It recently added to its product a new compute pipeline with AI-specific features. Cambricon, the leading Chinese vendor, also supplies DLA cores to Huawei. In-house DLA cores are popular in smartphones: Apple, MediaTek, Qualcomm, and Samsung have all developed their own.

Focusing on their successful CPU and GPU businesses, both Arm and Imagination were late in introducing DLA cores. We expect these vendors to initially find customers in the IoT market, particularly for vision-based applications. These and other IP vendors must contend with the NVDLA, an open-source core that Nvidia created. Although the NVDLA offers competitive performance per watt and is available without license fees, it lacks customer support and a roadmap.

Several chip vendors, mainly startups, target IoT applications. The smart-camera market is particularly popular, given China's ambition to deploy 200 million surveillance cameras over the next few years. Gyrfalcon and NovuMind offer small chips to accelerate inference in existing designs. Bitmain and Intel provide more-complete offerings that can serve as the main system-on-a-chip (SoC) in a drone or security camera. For small battery-powered devices, such as remote sensors and wearables, GreenWaves and Eta Compute offer microcontrollers that integrate tiny DLAs to stretch battery life when running neural networks for light audio or image recognition.

List of Figures
List of Tables
About the Authors
About the Publisher
Preface
Executive Summary
1 Deep-Learning Applications
What Is Deep Learning?
Cloud-Based Deep Learning
Advanced Driver-Assistance Systems
Smart Cameras
Financial Technology
Health Care and Medicine
Manufacturing
Robotics
Voice Assistants
2 Deep-Learning Technology
Artificial Neurons
Deep Neural Networks
Spiking Neural Networks
Neural-Network Training
Training Spiking Neural Networks
Pruning and Compression
Neural-Network Inference
Quantization
3 Deep-Learning Accelerators
Accelerator Design
Data Formats
Computation Units
Dot Products
Systolic Arrays
Handling Sparsity
Other Common Functions
Processor Architectures
CPUs
GPUs
DSPs
Custom Architectures
FPGAs
Performance Measurement
Peak Operations
Neural-Network Performance
MLPerf Benchmark
AI-Benchmark
4 Market Forecast
Market Overview
Data Center and HPC
Market Size
Market Forecast
Automotive
Market Size
Market Forecast
Autonomous Forecast
Client and IoT
Market Size
Market Forecast
5 AImotive
Company Background
Key Features and Performance
Conclusions
6 AMD
Company Background
Key Features and Performance
Conclusions
7 Arm
Company Background
Key Features and Performance
Product Roadmap
Conclusions
8 Cadence
Company Background
Key Features and Performance
Product Roadmap
Conclusions
9 Cambricon
Company Background
Key Features and Performance
Cambricon IP Cores
MLU Accelerator Card
Product Roadmap
Conclusions
10 Ceva
Company Background
Key Features and Performance
Product Roadmap
Conclusions
11 Google
Company Background
Key Features and Performance
Product Roadmap
Conclusions
12 Graphcore
Company Background
Key Features and Performance
Conclusions
13 Gyrfalcon
Company Background
Key Features and Performance
Product Roadmap
Conclusions
14 Imagination
Company Background
Key Features and Performance
Conclusions
15 Intel
Company Background
Xeon Scalable
Key Features and Performance
Product Roadmap
Stratix FPGAs
Key Features and Performance
Product Roadmap
Movidius Myriad
Key Features and Performance
Product Roadmap
Conclusions
16 Intel (Mobileye)
Company Background
Key Features and Performance
Product Roadmap
Conclusions
17 Microsoft
Company Background
Key Features and Performance
Conclusions
18 Mythic
Company Background
Key Features and Performance
Conclusions
19 NVDLA
Company Background
Key Features and Performance
Conclusions
20 Nvidia AGX
Company Background
Key Features and Performance
Product Roadmap
Conclusions
21 Nvidia Tesla
Company Background
Key Features and Performance
Product Roadmap
Conclusions
22 NXP
Company Background
Key Features and Performance
Conclusions
23 Qualcomm
Company Background
Key Features and Performance
Conclusions
24 Synopsys
Company Background
Key Features and Performance
Conclusions
25 Videantis
Company Background
Key Features and Performance
Conclusions
26 Wave Computing
Company Background
Key Features and Performance
Product Roadmap
Conclusions
27 Xilinx
Company Background
Key Features and Performance
UltraScale+
Alveo
Versal
Conclusions
28 Other Vendors
Amazon
Bitmain
Key Features and Performance
Conclusions
BrainChip
Cerebras
Cornami
eSilicon
Key Features and Performance
Conclusions
Eta Compute
Key Features and Performance
Conclusions
General Processor
Key Features and Performance
Conclusions
GreenWaves
Key Features and Performance
Conclusions
Groq
Habana
Key Features and Performance
Product Roadmaps
Conclusions
Huami (Abee)
Key Features and Performance
Conclusions
Huawei
Key Features and Performance
Conclusions
NovuMind
Key Features and Performance
Conclusions
SambaNova
29 Processor Comparisons
How to Read the Tables
Data-Center Training
Architecture
Performance
Interfaces
Summary
Data-Center Inference
Architecture
Performance
Interfaces
Summary
Automotive Processors
CPU Subsystem
Vision Processing
Interfaces
Summary
Embedded Processors
Performance
Interfaces
Summary
Microcontrollers
Performance
Interfaces
Summary
Deep-Learning IP
Architecture
Summary
30 Conclusions
Market Summary
Data Center
Automotive
Client
Technology Trends
Neural Networks
Hardware Options
Performance Metrics
Vendor Summary
Data Center
Automotive
Embedded and IoT
IP Vendors
Closing Thoughts
Appendix: Further Reading
Index
Figure 1‑1. Deep learning for autonomous vehicles
Figure 1‑2. NHTSA and SAE autonomous-driving levels
Figure 1‑3. Waymo autonomous test vehicle
Figure 1‑4. A smart surveillance camera
Figure 1‑5. Processing steps in a computer-vision neural network
Figure 1‑6. Example biopsy images used to diagnose breast cancer
Figure 1‑7. Robotic arms use deep learning
Figure 1‑8. Various smart speakers
Figure 2‑1. Neuron connections in a biological brain
Figure 2‑2. Model of a neural-network processing node
Figure 2‑3. Common activation functions
Figure 2‑4. Model of a four-layer neural network
Figure 2‑5. Spiking effect in biological neurons
Figure 2‑6. Spiking-neural-network pattern
Figure 2‑7. Pruning a neural network
Figure 2‑8. Mapping from floating-point format to integer format
Figure 3‑1. Common AI data types and approximate data ranges
Figure 3‑2. Arm dot-product operation
Figure 3‑3. A systolic array
Figure 3‑4. Performance versus batch size
Figure 4‑1. Revenue forecast for deep-learning chips, 2016–2023
Figure 4‑2. Unit forecast for deep-learning chips, 2016–2023
Figure 4‑3. Unit forecast for ADAS-equipped vehicles, 2015–2023
Figure 4‑4. Revenue forecast for ADAS processors, 2015–2023
Figure 4‑5. Unit forecast for client deep-learning chips, 2015–2023
Figure 11‑1. Google TPUv2 board
Figure 12‑1. Graphcore C2 card
Figure 18‑1. Mythic's flash-based neural-network tile
Figure 23‑1. Snapdragon AI performance
Figure 28‑1. Block diagram of GreenWaves GAP8
Figure 29‑1. ResNet-50 inference throughput
Figure 30‑1. ImageNet LSVRC trends, 2010–2017
Figure 30‑2. Deep-learning accelerators
Table 4‑1. Data-center DLA units and revenue, 2017–2023
Table 5‑1. Key parameters for AImotive AIware CNN accelerator
Table 6‑1. Key parameters for AMD Radeon Instinct accelerators
Table 7‑1. Key parameters for Arm machine-learning core
Table 8‑1. Key parameters for Cadence deep-learning accelerators
Table 9‑1. Key parameters for Cambricon deep-learning accelerators
Table 9‑2. Key parameters for Cambricon MLU100 accelerator card
Table 10‑1. Key parameters for Ceva deep-learning accelerators
Table 11‑1. Key parameters for Google TPU accelerators
Table 12‑1. Key parameters for Graphcore GC2 processor
Table 13‑1. Key parameters for Gyrfalcon Lightspeeur coprocessors
Table 14‑1. Key parameters for Imagination PowerVR AI accelerators
Table 15‑1. Key parameters for selected Intel Cascade Lake processors
Table 15‑2. Key parameters for selected Intel Stratix 10 GX FPGAs
Table 15‑3. Key parameters for Intel Movidius processors
Table 16‑1. Key parameters for Mobileye EyeQ processors
Table 17‑1. Key parameters for Microsoft Brainwave accelerator
Table 19‑1. NVDLA sample configurations
Table 20‑1. Key parameters for Nvidia automotive processors
Table 21‑1. Key parameters for Nvidia Tesla processors
Table 22‑1. Key parameters for NXP S32V234 processor
Table 24‑1. Key parameters for selected Synopsys ARC cores
Table 25‑1. Key parameters for Videantis v-MP6000UDX DLA IP core
Table 26‑1. Key parameters for Wave DPU accelerator
Table 27‑1. Key parameters for selected Xilinx FPGAs
Table 28‑1. Key parameters for Habana Goya accelerator card
Table 29‑1. Comparison of floating-point DLAs for data centers
Table 29‑2. Comparison of integer DLAs for data centers
Table 29‑3. Comparison of automotive processors for deep learning
Table 29‑4. Comparison of embedded processors for deep learning
Table 29‑5. Comparison of microcontrollers for deep learning
Table 29‑6. Comparison of deep-learning IP cores (part one)
Table 29‑7. Comparison of deep-learning IP cores (part two)

Events

Linley Spring Processor Conference 2019
April 10 - 11, 2019
Hyatt Regency, Santa Clara, CA
Register Now!
Linley Fall Processor Conference 2019
October 23 - 24, 2019
Hyatt Regency, Santa Clara, CA
More Events »

Newsletter

Linley Newsletter
Analysis of new developments in microprocessors and other semiconductor products
Subscribe to our Newsletter »