Microprocessor Report (MPR) Subscribe

Pensando, Xilinx Debut Smart NICs

New Entrants Seek Broader Market Beyond Hyperscale Data Centers

June 15, 2020

By Bob Wheeler


For merchant vendors, the smart-NIC market has fallen short of the hype. Most vendors hoped that adoption of captive designs at Amazon and Microsoft would create pull for smart NICs at other leading cloud-service providers (CSPs). Of the seven largest CSPs, however, we believe only Baidu and Tencent purchased merchant smart NICs last year, whereas the remainder used a mix of standard NICs and captive smart NICs. Given limited opportunity at the highest-volume customers, merchant vendors are looking to other segments for growth: second-tier CSPs, enterprise data centers (private clouds), and telecommunications-service providers.

To address this broader market, two new entrants are developing more software than incumbent smart-NIC vendors, which typically deliver little more than drivers and a software-development kit. If their solutions provide higher-value services, they can charge higher prices that more than offset the lower volumes available. This reasoning explains how startup Pensando raised a whopping $278 million and why Xilinx paid $400 million for Solarflare.

Shipping since 3Q19, Pensando’s initial offering includes a choice of dual-port 25G Ethernet or 100G Ethernet adapters, which the company calls distributed services cards (DSCs). To centrally manage up to 1,000 DSCs, policy- and services-manager (PSM) software runs on a separate high-availability server cluster. In addition to network and storage virtualization, the startup delivers networking, security, and telemetry services that traditionally required specialized appliances.

The first smart NIC from Xilinx, scheduled for general availability in 3Q20, is the Alveo U25, a dual-port 25G Ethernet adapter combining an FPGA with a Solarflare ASIC. Initial services will include Open vSwitch (OVS), IPSec, TLS, and firewall offloads, and the card works with Solarflare’s low-latency TCP/IP stack. The U25 has dual PCIe interfaces, enabling the host to directly access the FPGA to accelerate compute workloads such as machine learning and video transcoding.

In the enterprise segment, financial-services companies are among the technology leaders; Goldman Sachs was a founding member of the Open Compute Project, and it’s also Pensando’s first disclosed customer. Solarflare focused on financial customers that valued its technology for equity trading. As early adopters of private clouds that mimic larger public clouds, financial companies are natural targets for smart-NIC vendors. For the merchant smart-NIC market to grow beyond a niche, however, vendors must deliver off-the-shelf solutions fit for a broader customer base.

MPLS Thinks Big, Again

Pensando is the latest startup from the famed ex-Cisco team of MPLS—Mario Mazzola, Prem Jain, Luca Cafiero, and Soni Jiandani—which previously founded Crescendo, Andiamo, Nuova, and Insieme. Former Cisco CEO John Chambers is the startup’s chairman, while Jain and Jiandani are CEO and chief business officer, respectively. CTO Vipin Jain is also a Cisco veteran through the acquisition of Nuova Systems. Founded in 2017, Pensando has grown to about 250 employees, the vast majority of whom are in software development.

The startup emerged from stealth mode in October 2019, simultaneously announcing a massive third funding round led by Hewlett Packard Enterprise (HPE) and Lightspeed Partners. Other investors include Equinix (a mid-tier CSP), Goldman Sachs, GV (formerly Google Ventures), NetApp, and Oracle. We expect the lattermost company is also the startup’s first public-cloud customer, whereas HPE provides a channel for its products.

Using a veteran ASIC team, Pensando developed a unique chip that combines a modest Arm CPU complex with a programmable data plane, as Figure 1 shows. The 16nm Capri ASIC has 8GB of in-package High Bandwidth Memory (HBM) to store large tables. It has a PCIe Gen4 x16 host interface and eight 25Gbps serdes to provide as many as 2x100GbE or 8x25GbE ports. The company’s adapters, branded Naples, require only the ASIC, eMMC flash memory for storage, and a 1000Base-T PHY for the out-of-band-management ports. The Naples DSC-100 card dissipates up to 36W including two QSFP28 optical modules. In addition to network and NVMe virtualization, the DSCs handle complex services including load balancing, stateful firewall, micro-segmentation, and VPN/TLS termination.

 

Figure 1. Pensando’s Capri ASIC. The 16nm SoC includes a 400Gbps programmable data plane, acceleration engines, High Bandwidth Memory (HBM), and a quad-core processor.

In Pensando’s architecture, the quad Cortex-A72 CPUs handle only control-plane processing, whereas the programmable pipelines handle all packet processing. The data plane divides 400Gbps of aggregate bandwidth across eight programmable pipelines, which enable up to four services to operate simultaneously at up to 100Gbps and 40 million packets per second (Mpps). The pipelines can access acceleration engines for encryption and compression. They can do all processing for established flows, involving the CPU complex for only new flows.

To program its pipeline, the startup began with the P4 language pioneered by Barefoot Networks (see MPR 8/8/16, “Barefoot’s Tofino Gives P4 a Test Spin”). Some of its services require memory transactions and stateful processing, however, necessitating language extensions. Pensando is proposing extensions in the P4 project, which is now under Open Networking Foundation management. TCP connection tracking is a common stateful application, but host adapters also offload TCP segmentation and reassembly. Pensando’s design uses DMA to move packets to and from memory and the acceleration engines, and the existing P4 language lacks DMA constructs.

Xilinx Moves Up the Stack

The Alveo U25 is similar to the Innova-2 from Mellanox (now Nvidia), a bump-in-the-wire design that combines an Ethernet controller with a Xilinx FPGA (see MPR 11/20/17, “Mellanox Brings More Smarts to NICs”). In the U25, the Solarflare SFC9250 controller connects to one PCIe Gen3 x8 interface on the host side and to a Zynq UltraScale+ FPGA on the network side, as Figure 2 shows. The company designates the custom FPGA variant as the XCU25, but we believe it derives from the standard ZU19EG. The bump-in-the-wire FPGA then connects with a pair of SFP28 ports, but it also connects directly to an optional second PCIe Gen3 x8 host interface.

 

Figure 2. Alveo U25 smart NIC. The FPGA is a bump in the wire for network traffic, but it also connects directly to the host through a second PCIe interface.

The controller ASIC handles basic Ethernet functions, whereas the FPGA handles compute and storage offloads as well as advanced networking. The ASIC provides backward compatibility with Solarflare drivers and the Onload user-space TCP stack. Onload bypasses the kernel, however, making it and OVS mutually exclusive. The FPGA integrates a quad Cortex-A53 CPU complex that operates at up to 1.5GHz and can handle control-plane functions. It should deliver roughly half the performance of Pensando’s integrated control-plane processor. The U25 card includes three DDR4-2666 channels: dual 72-bit with 4GB plus one 40-bit with 2GB.

Xilinx demonstrated its FPGA-based OVS offload using the shipping Alveo U200 accelerator card (see MPR 2/18/19, “Xilinx Delivers Server Acceleration”). That 100Gbps design consumed about 350,000 LUTs, whereas a 100Gbps third-party IPSec block consumes about 145,000 LUTs. Although 50Gbps versions of these blocks would use fewer resources, they would still require much of the XCU25’s 520,000 LUTs, leaving few resources for storage or compute offloads. Xilinx claims the ability to converge network, storage, and compute acceleration, but it has yet to show how all three can coexist in the U25.

The smart-NIC roadmap should be more compelling, as the company plans to integrate the Solarflare controller IP as a hard core in a future 7nm Versal device (see MPR 3/30/20, “Versal Premium Targets Core Networks”). It withheld details, but its integrated smart-NIC chip should include many of the upgrades found in announced Versal products, such as PCIe Gen5, PAM4 serdes, and dual Cortex-A72 CPUs. Just as important, however, is the company’s roadmap for turnkey services, which it hasn’t detailed. On the plus side, Xilinx offers a P4-SDNet compiler, enabling customers to program the FPGA using P4 code.

Smarter 25GbE for Enterprise

Architecturally, the Pensando and Xilinx designs stand apart from recent SoCs created for smart NICs, including Broadcom’s Stingray and Nvidia’s BlueField-2 (see MPR 9/9/19, “Mellanox Right-Sizes Smart-NIC SoC”). Conceptually, those SoCs combine a standard-NIC block with a high-performance octa-core Arm complex that handles advanced services. By comparison, the Pensando and Xilinx approach handles all data-plane processing in programmable-hardware pipelines to maximize small-packet performance and minimize power dissipation.

Table 1 shows the Pensando DSC-25 and Xilinx U25 adapters, both of which offer two SFP28 ports that support 25GbE as well as 10GbE. Both designs handle line-rate throughput for dual 25GbE ports. They integrate enough DRAM for tables holding about one million entries, but Pensando’s in-package HBM delivers nearly four times the bandwidth of the less costly DDR4 memory Xilinx uses. This difference should make the DSC-25’s performance more deterministic, as lookups that miss on-chip caches will be faster.

 

Table 1. Comparison of 25GbE smart NICs. The Pensando SoC delivers high performance for advanced network services in a modest power envelope. The two-chip Alveo design enables network and compute acceleration in a mainstream form factor. (Source: vendors, except *The Linley Group estimate)

Although both half-height half-length cards implement PCIe Gen3, Pensando multiplexes all traffic over a single x8 interface, whereas Xilinx optionally splits network traffic from other traffic using two x8 interfaces. In this latter configuration, compute acceleration avoids competing with network traffic for PCIe bandwidth. Pensando rates the DSC-25 at 20W (typical), which is about half the power of the Broadcom Stingray-based PS225 adapter. Xilinx has yet to disclose the U25’s power dissipation, but we estimate it’s about three times the DSC-25’s.

Pensando is delivering the broadest set of turnkey services yet seen in a smart NIC. Some vendors have offered OVS offload or IPSec, but other services require customer programming. Although the initial Xilinx product appears similar for networking, it uniquely includes machine-learning (inference) and video-transcoding acceleration. Until the company specifies network-plus-compute performance, however, the value of convergence remains unclear.

High IQs Can Spark Growth

Through multiple generations of its Nitro system, Amazon has shown how network and storage resources can be virtualized for bare-metal compute instances. Only the biggest hyperscalers, however, have the resources to develop similar ASICs and software. Most merchant smart-NIC vendors provide only low-level software, leaving customers to develop applications and services.

Pensando is the first vendor to break this mold by delivering turnkey solutions that match—and perhaps exceed—Nitro’s capabilities. By doing so, it can serve second-tier CSPs in addition to enterprise customers, creating a large serviceable market for its products. In fact, if the startup is successful, it will expand the Ethernet-adapter market as a whole by capturing value that appliances previously delivered. This vision explains how it raised up to $145 million in its latest funding round.

With its first smart NIC, Xilinx opted to reach the market quickly rather than offer a complete and optimum solution. The Alveo U25 is a work in progress that combines existing Solarflare and Xilinx intellectual property but lacks the completeness of the Pensando solution. Furthermore, we doubt the U25 can deliver network, storage, and compute convergence at compelling power and performance levels. Instead, it will likely serve as a placeholder and proof of technology as Xilinx develops its integrated 7nm smart NIC.

Over the last half-decade, the merchant smart-NIC market has been unable to sustain growth, as the largest customers have adopted internally developed solutions. Serving a broader customer base requires a level of software investment that chip vendors have been unwilling to meet. Pensando and Xilinx represent the next wave of merchant vendors, and both differentiate from incumbents. Rather than chase the lure of hyperscale volume, the new entrants want higher gross margins by delivering greater value to more customers. If they succeed, the smart-NIC market could within a few years rival the traditional-NIC market in size.

Price and Availability

Pensando’s DSC-25 and DSC-100 cards are in production; more information is at pensando.io/our-platform. The Alveo U25 from Xilinx should reach general availability next quarter; product information is at www.xilinx.com/products/boards-and-kits/alveo/u25.html. Neither vendor disclosed pricing.

Free Newsletter

Linley Newsletter
Analysis of new developments in microprocessors and other semiconductor products
Subscribe to our Newsletter »

Events

Linley Fall Processor Conference 2021
Coming October 20-21, 2021
Hyatt Regency Hotel, Santa Clara, CA
Register Now!
Linley Spring Processor Conference 2021
April 19 - 23, 2021
Proceedings Available
More Events »