» Current | 2018 | 2017 | 2016 | Subscribe

Linley Newsletter

Arm Dot Products Accelerate CNNs

July 10, 2018

Author: Mike Demler

Arm's new dot-product instructions deliver up to a 4x performance boost to convolutional-neural-network (CNN) operations running on 64-bit Cortex-A CPUs. These networks are the most popular deep-learning architecture for image recognition and other machine-learning applications.

Analyzing an image can require billions of dot products, using multiply-accumulators (MACs) to process the pixel data through feature-extraction filters. For example, the popular ResNet-50 CNN requires 3.9 billion MAC operations per image. Designers can run CNNs on Cortex-A and Cortex-M CPUs, as well as Mali GPUs, using the neural-network libraries and APIs in Arm's Project Trillium software stack. Although large network models such as ResNet-50 are better suited to dedicated deep-learning accelerators (DLAs), the new dot-product instructions make a general-purpose CPU with SIMD capabilities sufficient for small networks.

Running some CNN operations on the CPU also allows it to work in parallel with a GPU or DLA, distributing the workload in a heterogeneous SoC. The new Neon instructions calculate four 4x4 dot products, accumulating the results in a 128-bit vector comprising the four 32-bit totals. These instructions were officially introduced as part of Arm v8.4, first implemented in the Cortex-A76 CPU. However, they were "accelerated" into the older Cortex-A55 and Cortex-A75, which are Arm v8.2 compliant.

These instructions allow Arm’s newest CPUs to deliver impressive neural-network performance. Running at a 2.4GHz clock frequency, for example, Cortex-A76 can achieve 614GOP/s (or 307GMAC/s), which is twice the peak performance of a small DLA such as Cadence's P6 and Ceva's XM4.

Subscribers can view the full article in the Microprocessor Report.

Subscribe to the Microprocessor Report and always get the full story!

Purchase the full article

Events

Linley Fall Processor Conference 2018
Covers processors and IP cores used in embedded, communications, automotive, IoT, and server designs.
October 31 - November 1, 2018
Hyatt Regency, Santa Clara, CA
Register Now!
More Events »

Newsletter

Linley Newsletter
Analysis of new developments in microprocessors and other semiconductor products
Subscribe to our Newsletter »