Microprocessor Report (MPR) Subscribe

eSilicon 7nm Serdes Hits 56Gbps

NeuASIC Platform Includes AI Accelerators for 2.5D/3D ICs

July 16, 2018

By Mike Demler


ASIC provider eSilicon specializes in high-performance devices for communications infrastructure, networking, and other data-center applications. Using 7nm TSMC technology, it has developed ASIC-design platforms under the NeuASIC brand. Each includes hard and soft macros for networking applications along with a new architecture and intellectual-property (IP) library for building AI accelerators.

The NeuASIC platforms give designers a variety of power-optimized memory compilers, serdes, and 2.5D-IC interposers. The 7nm library includes a 56Gbps serdes, High Bandwidth Memory 2 (HBM2) PHY, and ternary-content-addressable-memory (TCAM) compiler, as well as networking-optimized I/Os and other components. The eSilicon design team is also working on a 112Gbps serdes, which it plans to tape out in 1Q19 on a 7nm test chip.

To maximize memory bandwidth, the company manufactures networking products in 2.5D packages using silicon interposers to combine the ASIC die with stacked DRAM chips, as Figure 1 shows. For AI accelerators, NeuASIC will enable designers to integrate a custom deep-learning accelerator (DLA) in an ASIC chassis comprising a CPU, scratchpad RAM, and HBM2 interfaces.

Figure 1. Substrate for eSilicon 2.5D networking ASIC. The company builds ASICs using its own memory compilers and serdes along with third-party IP. By assembling the ASICs in 2.5D packages, it can integrate stacks of up to eight HBM2 chips for high-performance data-center systems.

Adding Serdes IP to the Mix

Along with its ASIC design and development services, eSilicon manages manufacturing for its Tier One customers. Cisco is among its investors, and we believe that company uses eSilicon-designed ASICs in some of its switch products. According to several reports, eSilicon produced Intel’s Nervana AI accelerator as well. It also offers ASIC-design platforms for Samsung 14LPP and TSMC 16FF processes, along with 28nm memory compilers for multiple foundries.

CEO Jack Harding founded eSilicon in 1999 after holding the same position at Cadence. The company achieved early success in 2002, supplying ASICs for the first Apple iPods. Including strategic investment by Cisco, eSilicon has received more than $100 million from Catamount, Crescendo, Crosspoint, Fremont Ventures, IGC, and other investors. It currently has about 600 employees, with headquarters in Silicon Valley and design centers in Vietnam and elsewhere.

From 2008 to 2015, eSilicon licensed Avago serdes IP for its ASICs. That IP supports 28nm and older planar processes. After the Avago agreement expired, the company worked with Rambus to port its 28Gbps serdes to Samsung’s 14nm FinFET process. In 2017, after Marvell shut down most of its European operations, eSilicon acquired the Italian engineering team that had developed a 56Gbps Marvell serdes for manufacture in 28nm technology. That group used the same ADC/DSP-based architecture to develop a 7nm serdes, which now appears in the NeuASIC platforms and is available separately as a licensable core. This new serdes enables PAM4 and NRZ coding (see MPR 9/8/14, “Tradeoffs Abound for 56Gbps I/O”), and its programmability allows designers to tune power/performance for long or short channels.

The design supports configurations of up to eight lanes and thus handles 400G Ethernet. Each lane integrates a DSP and ADC, so users can independently set the transmit/receive data rates on a per-lane basis. An MCU interface enables channel monitoring, such as bit-error-rate (BER) histograms. Design targets include a 10–6 BER and –38dB insertion loss in long-reach systems. The serdes is available now for customer design starts.

Stacking Memories

NeuASIC includes a 7nm hard macro for an HBM2 PHY (see MPR 5/2/16, “HBM2 Doubles Bandwidth, Capacity”). For GlobalFoundries and TSMC 28nm HPC as well as Samsung 14LPP and TSMC 16FF+ technologies, the company offers an HBM2 PHY that complies with the DDR PHY Interface 4.0 (DFI 4.0) and IEEE 1550 specifications. In older nodes, the PHYs allow up to 2.0Gbps DDR operation. The 7nm TSMC technology, however, increases the speed to 2.4Gbps, enabling the 8x128-bit channels to deliver 307GB/s of bandwidth per HBM interface. In addition, eSilicon is developing for 2019 release a second-generation HBM2 PHY that will boost speed to 2.8Gbps. It’s also working on an HBM3 design that’s likely to begin production in 2019.

To complete the HBM interface, the company couples its HBM2 PHY with a controller from Northwest Logic, a supplier of interface IP for various DRAM standards as well as for MIPI, PCIe, and other I/Os. This HBM2 controller includes a data-byte-invert (DBI) function that reduces noise from simultaneous switching and a data-masking function for write operations. It also provides reliability, availability, and serviceability (RAS) features and allows designers to employ ECC and parity checks.

The PHY connects to the HBM stack through a silicon interposer using TSMC’s chip-on-wafer-on-substrate (CoWoS) packaging (see MPR 10/19/15, “iPhone 7 Dials 411 for InFo”). The eSilicon IP includes the interposer and SiP designs for two-, four-, or eight-high HBM2 stacks. A proprietary routing scheme minimizes crosstalk and skew in the interposer signals. The company works with Amkor and TSMC for the multichip packaging, and it provides a design kit and EDA-tool setup for signal- and power-integrity analysis, for chip optimization to work with through-silicon-via (TSV) interfaces, and for reliability/failure analysis.

Performance CAMs Boost Search Engines

Content addressable memories (CAMs) accelerate network searches by scanning an entire table in one clock cycle. The eSilicon IP portfolio includes binary- and ternary-CAM (BCAM and TCAM) compilers for 7nm to 180nm technologies. The TCAMs add the capability to store don’t-care (X) bits (see MPR 4/25/11, “Search Coprocessors Route Packets”). The CAM IP targets lookup functions in Ethernet switches, network processors, and other networking systems.

Designers can configure the TCAM depth and width for up to 1K entries and 160 bits per word, yielding a total macroblock size of 160Kb. To increase depth, designers can cascade multiple TCAM macros, up to 40Mb total. In 7nm TSMC technology with worst-case process corners, the TCAMs enable up to a 1.8GHz clock frequency, equivalent to 1.8 billion searches per second. By comparison, the 16nm TCAM IP from General Unichip runs at a maximum 1.0GHz clock frequency. NeuASIC’s memory compiler allows designers to select options including a priority encoder and redundancy as well as bit, group, and global masking.

The TCAMs offer several power-saving options. The four-quadrant-architecture option enables users to reduce power by turning on just one quadrant at a time. A low-power model conserves energy by reducing the hit-line signal swing. Multitable partitioning lets users minimize power consumption by limiting searches to a single variable-depth table rather than simultaneously searching all tables. Designers can also choose a vertically partitioned organization, which supports multiwidth searches in quarter-, half-, single-, and doubleword modes.

The vertically partitioned design pipelines the table search between primary and secondary partitions. Entries that match in the primary partition limit searches on the next cycle to entries in the secondary partition. The tradeoff for the extra cycle of latency is up to 60% power savings, depending on the number of matching entries in the primary partition. The duo-CAM mode operates the two vertical partitions independently with common peripheral read/write circuits. This option allows each array to have its own match keys and match-out flags for each entry, but the shared circuitry reduces area by up to 30% compared with two independent TCAMs. A dual-port search option lets users increase performance by doing two searches in one clock cycle. It includes two hit lines and two match data-in signals per bit cell.

Plug-and-Play AI

To address the memory bottlenecks that often limit neural-network-engine performance, eSilicon applied its 2.5D design expertise to develop a new machine-learning ASIC platform (MLAP). It has assigned 100 engineers to the NeuASIC MLAP, with most probably working in its Vietnam design center. We expect the company is building on the experience it gained by creating the Intel/Nervana AI accelerator, which combines that company’s custom chip with 32GB of HBM2 DRAM in a 2.5D package (see MPR 6/27/16, “Learning Chips Hit the Market”).

As Figure 2 shows, the NeuASIC MLAP has interchangeable AI tiles in an ASIC chassis, which connects to HBM memories on a 2.5D substrate. The company is also planning to develop a 3D version that mounts the AI chip directly onto the chassis. The chassis includes separate high-speed NoCs for the control and data paths. It has up to 256MB of scratchpad memory along with I/Os for external memory and standard system interfaces.

 

Figure 2. eSilicon machine-learning ASIC. The design integrates configurable AI cores comprising neural-network tiles in an ASIC chassis. The chassis includes high-speed NoCs, interfaces to external memory, and a CPU core that controls neural-network operations.

The MLAP enables designers to assemble components from eSilicon 7nm AI libraries into a customizable accelerator, as Figure 3 shows. The company withheld architectural details, but the “megacell” library comprises convolution engines employing MAC arrays with tightly coupled memory subsystems as well as function blocks for other standard convolutional-neural-network (CNN) layers. The library’s memory subsystems include a transpose block for CNN upscaling and the HBM2 controller/PHY. The gigacells are complete AI-processor subsystems, including an (undisclosed) choice of CPU cores, NoCs, and I/O subsystems such as DDR, HBM2, and serdes. The NeuASIC MLAP is scheduled to for production availability in 2H19.

 

Figure 3. Configurable AI cores. The eSilicon AI tiles allow designers to configure custom cores integrating functions for convolution, pooling, and other typical neural-network layers, along with memories that are physically matched to the tiles.

In its AI-tile-based architecture, eSilicon aims to support the rapid pace of neural-network algorithm development by enabling designers to easily swap out tiles as their requirements change. The AI tile’s physical layout allows plug-and-play placement for connecting directly to memories generated by the NeuASIC compilers. The tightly coupled compute and memory blocks optimize power efficiency. Several multiport memory architectures are available to handle the large variety of AI-compute memory-access requirements. The company supports the NeuASIC MLAP with the AI Engine Explorer, Chassis Builder, and Design Profiler tools, which let designers configure AI tiles along with third-party IP to evaluate the power, performance, and area (PPA) of candidate architectures.

Custom ASICs Suit the Super 7

Taking a 7nm ASIC to production is an expensive endeavor, so the potential NeuASIC customers are a small but elite group. The company’s target list focuses on the “Super 7” cloud-service providers: Alibaba, Amazon, Baidu, Google, Facebook, Microsoft, and Tencent. Its closest competitor is Broadcom, which announced a similar 7nm ASIC platform in November. That alternative also uses TSMC’s FinFET process and CoWoS packaging. It includes Broadcom’s 58Gbps/112Gbps PAM4 serdes, HBM2/HBM3 PHYs, and TCAMs, as well as a selection of Arm CPUs. Although Broadcom positions its 7nm ASICs for deep learning and networking, it has yet to announce a machine-learning core or a design method to compete with the NeuASIC MLAP.

 GlobalFoundries has announced a 7nm ASIC platform as well, and it previously developed a 56Gbps serdes for its 14nm process (see MPR 1/9/17, “GlobalFoundries Offers 56Gbps Serdes”). It inherited IBM’s ASIC and networking expertise, but it lacks machine-learning IP. Whereas TSMC began 7nm volume production in 2Q18, however, GlobalFoundries is a year behind (see MPR 4/16/18, “TSMC 7nm Approaches Intel’s Prowess”).

The 7nm networking platform will appeal to Internet giants such as Facebook and Google, which will likely find it easier to work with a small vendor such as eSilicon than a notoriously aggressive negotiator such as Broadcom. The platform is also a good fit for cellular-infrastructure suppliers Ericsson and Nokia as well as other networking-equipment OEMs. The company offers a competitive technology portfolio along with an attractive resume that includes building high-performance ASICs for Apple and Cisco. The serdes will be available as a separately licensable core, so it can compete with products from Credo, Kandou, and other IP suppliers (see MPR 2/8/16, “Credo Goes ASSP With PAM4 PHY,” and MPR 9/11/17, “Kandou 500Gbps Serdes IP Targets 2.5D”).

The NeuASIC MLAP may differentiate eSilicon from its ASIC competitors, but the company has revealed few details. A 2.5D reconfigurable ASIC is more expensive to develop than an FPGA-based design, and incorporating modifications takes much longer (see MPR 11/20/17, “FPGAs Accelerate Deep Learning”). Intel and Xilinx both integrate their programmable chips in multichip packages with HBM.

Companies such as Baidu and Google are developing custom AI accelerators using in-house IP (see MPR 5/8/17, “Google TPU Boosts Machine Learning”), so they’re unlikely to employ the AI-tile method. The Super 7 can afford to mix and match the best IP cores to build custom ASICs. For example, Synopsys provides AI accelerators as well as CPU and HBM IP along with an extensive library of analog- and standard-interface IP, offering a more complete platform than the NeuASIC MLAP.

The list of AI-IP competitors is formidable. Beyond Synopsys, it includes Arm, Cadence, Cambricon, Ceva, and others. To win Super 7 customers, eSilicon must demonstrate that its AI technology can outperform licensable accelerators from a rapidly growing list of competitors.

Price and Availability

The eSilicon NeuASIC platform is available now. The company withheld pricing. For more information on the NeuASIC networking platform, access www.esilicon.com/products/technologies/finfet-class-7nm-ip-platform. More information on the NeuASIC machine-learning platform is at www.esilicon.com/capabilities/neuasic-7nm-platform-machine-learning-asic-design.

Events

Linley Fall Processor Conference 2018
Covers processors and IP cores used in embedded, communications, automotive, IoT, and server designs.
October 31 - November 1, 2018
Hyatt Regency, Santa Clara, CA
More Events »

Newsletter

Linley Newsletter
Analysis of new developments in microprocessors and other semiconductor products
Subscribe to our Newsletter »