Linley Newsletter

IBM Demonstrates New AI Data Types

April 6, 2021

Author: Linley Gwennap

IBM researchers appear to have solved a fundamental problem for neural networks: how to train and inference them using smaller, more efficient data types. At the recent ISSCC, the company presented a test chip that supports hybrid 8-bit floating-point (HFP8) calculations that operate twice as fast as the FP16 calculations typical of AI training but yield nearly identical training results.

Similarly, the chip’s 4-bit integer (INT4) calculations double efficiency for AI inference with almost no loss of model accuracy. The smaller data types also double effective memory capacity and bandwidth, improving the effectiveness of on-chip storage. These smaller data types could serve throughout the industry to double the performance of next-generation AI accelerators.

IBM’s HFP8 format employs two arrangements of exponent and mantissa bits, providing extra resolution in the forward pass and greater range for back propagation. As a result, the model’s output accuracy is nearly identical when using FP32 or HFP8 across a range of model types.

The new test chip demonstrates the value of this innovation. At its peak speed, the small 7nm design can achieve 1.9 teraflops per second per watt (TF/W) using HFP8, more than twice the efficiency of Nvidia’s Ampere design. In a low-power mode, the chip has an even better rating: 3.5TF/W. When using INT4 for inference, it delivers 16.5 TOPS/W, more than double the rating of Qualcomm’s low-power Cloud AI 100 module. It achieves these results through unique multiply-accumulate (MAC) units that perform multiple operations per cycle on the smaller data types.

