Power8 Hits the Merchant Market
Memory Bandwidth Helps IBM Server Processor Ace Big Benchmarks
IBM is making good on its plan to sell Power8 processors to third parties, with Tyan already offering rack-mount development systems. Newly disclosed scores show Power8 beating Intel’s most powerful server processor, the 18-core Xeon E5-2699v3 (Haswell-EP), on important benchmark tests. Both processors deliver outstanding performance on the SPEC CPU benchmarks, but IBM’s huge advantages in multithreading and memory bandwidth favor Power8 when running larger test suites that more closely reflect real-world enterprise applications.
Overall, the results show that IBM offers a viable high-end alternative to Intel’s market-leading products. Equally important to Big Blue, Power8’s performance is energizing the OpenPower Foundation, an IBM-led alliance that rallies other companies to create a larger hardware and software ecosystem around the processor. IBM is offering Power8 chips to system builders in the merchant semiconductor market and is even licensing the architecture to other processor vendors. So far, the alliance has more than 80 members, including software, system, and semiconductor vendors.
Power8 is IBM’s most powerful microprocessor yet. On the merchant market, it’s available with 8, 10, or 12 CPU cores at maximum clock frequencies of 3.126GHz to 3.758GHz, as Table 1 shows. Compared with its Power7+ predecessor, which is not a merchant product, Power8 offers twice the threads and L2 cache per core, up to 20% more L3 cache, a new L4 cache, up to four times the peak DRAM bandwidth, and twice the per-core SPEC CPU throughput.
Table 1. IBM Power8 processors for the merchant market. IBM may announce additional models; some nonmerchant models have higher specifications. The SPEC scores are IBM’s unaudited and unpublished estimates. *MBC: memory buffer chip—required to attach DRAM; ‡SPEC_rate_base2006 per socket. (Source: IBM, except †The Linley Group estimate)
Power8 began production in 2Q14 and is already available in systems, such as IBM’s Power S824 server and Tyan’s GN70-BP010 “Palmetto” reference system. Whereas the Power S824 is a production server, Tyan’s offering is a rack-mount 2U chassis intended for development; it uses a quad-core Power8 processor. Taiwanese OEM Wistron is showing another 2U chassis scheduled for production in mid-2015.
Memory Buffers Boost Bandwidth
IBM has yet to publish chip pricing, but on the basis of the company’s guidance, we estimate Power8 processors will cost $850 to $2,500 in 1,000-unit lots. These prices range toward the high end of Intel’s list prices for the newest Xeon E5-2600v3 server processors, which use the new Haswell-EP microarchitecture. Intel offers a greater variety that spans a wider performance range, however. When comparing processors with similar performance, our analysis shows that Power8 is much more affordable. Intel’s list prices for the 8-, 10-, and 12-core models range from $612 to $2,145. We estimate the most powerful Xeon (the 18-core E5-2699v3) sells for a whopping $4,100.
The Power8 processor price does not include the required “Centaur” memory-buffer chips (MBCs), which we estimate cost $90 each. Each of those chips also adds 20W TDP (thermal design power, a near-maximum rating), or about 16.5W typical. Each MBC contains 16MB of L4 cache and connects to four 64-bit DDR3-1600 DRAM channels (see MPR 9/2/13, “Power8 Muscles Up for Servers”). By moving the DRAM interfaces off the processor, they greatly reduce Power8’s pin count while boosting the memory bandwidth that is so critical to its exceptional performance. Using four MBCs, Power8 supports up to 1TB of DRAM per processor.
The merchant products are based on the same 22nm SOI Power8 chip that IBM is shipping in its own systems, but the company has trimmed down the feature set to better meet the needs of its merchant customers. For example, IBM’s merchant-market Power8 products handle up to four buffer chips and 205GB/s of peak DRAM bandwidth; the nonmerchant versions handle up to eight buffers and 410GB/s. The merchant parts also offer half as many PCI Express ports (24 lanes), and enable cache-coherent systems that are half as large (up to 24 sockets total).
These reductions help the merchant parts fit into a 50mm LGA package. The limitations on I/O as well as the CPU speed also keeps the merchant chips within a 190W TDP. These designs are more appropriate for mainstream server applications. Nevertheless, IBM is willing to sell other configurations to meet specific customer needs.
IBM Calibrates the Clock
Because IBM has never before sold Power processors on the merchant market, it has only now published detailed specifications. To attract more system vendors and other allies, IBM must make a case that its server processors are better than Intel’s, which have a dominant 96% market share. IBM has made much progress since announcing Power8 and founding the OpenPower Foundation more than a year ago (see MPR 8/26/13, “IBM to Sell Power Processors”).
The latest disclosures provide more product details and some benchmark scores that enable useful comparisons. One detail is that the maximum clock frequency of the merchant chips is 18% lower than we initially estimated, which cuts power consumption to more-manageable levels. IBM is currently selling Power8 systems running at 4.35GHz and says the processor can run even faster, but the TDP rises considerably at this speed, reducing the performance per watt.
Note that the maximum clock frequency varies across the Power8 line. The 8-core chip is the fastest, and the 12-core chip is the slowest. IBM calibrates the clock speeds to keep power consumption consistent at about 190W TDP for all the merchant chips. That consistency eases the burden on system designers, because they can build a box to a single thermal standard and populate the motherboard with any Power8 processor. This TDP is considerably higher than that of Intel’s hottest Xeon chips, particularly when the MBC power is added. A single-socket Power8 system with four MBCs will require about as much cooling as a dual-socket Xeon server.
On the SPEC CPU benchmarks, Power8 crushes its predecessor, even when comparing the eight-core chips. According to IBM’s preliminary results with the integer tests, Power8 scores 627 SPECint_rate_base2006, which is more than twice what Power7+ achieves (308). On the floating-point tests, Power8 scores 485 SPECfp_rate_base2006—twice as fast as Power7+ (242). These scores validate the new design’s superior microarchitecture, despite the older design’s clock-frequency advantage. And because Power8 consumes about the same wattage, its power/performance ratio is approximately twice as good.
Power8 Duels Haswell-EP
One expects a new processor to outshine its immediate predecessor, but the real test is whether it beats the competition. And the leading competitor is Intel’s Xeon E5-2600v3 family, which is based on the latest Haswell-EP microarchitecture and includes the most powerful x86 server processors. The largest family member is the 18-core E5-2699v3 (see MPR 9/22/14, “Intel Crams 18 Cores Into Xeon E5”).
Both IBM and Intel have superscalar CPUs that can execute up to eight instructions or micro-ops per clock cycle with extensive out-of-order execution. Power8’s global reorder buffers can juggle up to 224 instructions in flight; Xeon’s Haswell CPUs can juggle up to 192. Power8’s merchant-market clock speeds range from 3.126GHz to 3.758GHz; the Xeon E5v3 ranges from 1.8GHz to 3.7GHz. (In both cases, the processors can briefly reach higher clock frequencies in turbo mode.)
IBM’s greatest advantages are multithreading, internal cache, and external-memory bandwidth. Whereas no Intel processor has ever ventured beyond two threads per CPU, each Power8 CPU can run eight threads. Power8 also has twice the L2 cache per core (512KB versus 256KB), and its unified L3 cache is more than twice as large (64–96MB versus 10–45MB). Note that IBM’s L3 cache uses embedded DRAM (eDRAM) with single-transistor memory cells, which are more compact but slower than Intel’s conventional six-transistor SRAM cells.
As described above, Power8 provides much more external-memory bandwidth by moving the 64-bit DRAM interfaces into separate memory-buffer chips. Each of those chips has 16MB of L4 cache and provides 51.2GB/s of peak bandwidth to external DRAM. Thus, a configuration with four memory-buffer chips delivers 205GB/s of peak bandwidth per processor. By contrast, even Intel’s biggest 18-core E5-2699v3 has only four DDR4-2133 DRAM interfaces, which provide only 68GB/s. And it has no L4 cache.
For greater memory bandwidth and capacity, Intel’s Xeon E7 server processors use external memory-buffer chips to boost their peak DRAM bandwidth as high as 85.3GB/s, still far less than Power8’s bandwidth. Furthermore, the high-end Xeon E7 models cost much more (ranging from $4,394 to $6,841), and they currently use the older Ivy Bridge CPU microarchitecture (see MPR 2/24/14, “Ivy Bridge-EX Updates Xeon E7 Line”).
IBM’s 8 CPUs Beat Intel’s 14 CPUs
Table 2 shows two comparisons of IBM Power8 processors and Intel Xeon E5v3 Haswell-EP processors. The first comparison matches the 8-core Power8 and the 14-core Xeon E5-2697v3. The second matches IBM’s 12-core Power8 (the most powerful merchant model) and Intel’s 18-core E5-2699v3 (the most powerful Haswell-EP model).
Table 2. Power8 versus Haswell-EP. IBM’s SPEC scores are unaudited and unpublished estimates. *With four memory-buffer chips; ‡SPEC_rate_base2006 per socket. (Source: vendors, except †The Linley Group estimate)
In the first comparison, the 8-core Power8 is slightly faster than the 14-core Xeon E5-2697v3 on both the SPECint and SPECfp benchmarks. Specifically, Power8 is 2.5% faster on SPECint and 13.5% faster on SPECfp. These results are SPEC_rate_base scores, which means the vendors compiled SPEC’s benchmark code using standard optimization flags and employed all the available CPUs and threads. The IBM scores, however, are unaudited and unpublished preliminary results.
The Xeon E5-2697v3 has 75% more CPUs (14 versus 8), but Power8 has more than twice as many threads (64 versus 28). Another factor in Power8’s favor is that it runs at 3.758GHz versus Xeon’s 2.6GHz—a 45% advantage. Despite all these differences, the two processors are evenly matched in raw integer performance.
Power consumption is a different story. At 145W TDP, the E5-2697v3 is more power efficient than the 190W Power8, even before adding IBM’s required memory-buffer chips—two or four of them at 20W TDP each, depending on the amount of DRAM attached. Overall, the E5-2697v3 boasts a much better power/performance ratio on CPU-intensive tasks.
On the other hand, the price comparison favors IBM. We estimate the eight-core Power8 will list for $850 and the memory-buffer chips will cost about $90 each. Even with four buffer chips to attach the maximum amount of DRAM, the total cost is about $1,210. By contrast, Intel’s list price for the E5-2697v3 is $2,702. (Facing virtually no competition from AMD in high-performance server processors, Intel can ask high prices.)
Thus, the 8-core Power8 delivers about the same CPU performance as the 14-core Xeon for less than half the price, but it consumes 59% to 86% more power, depending on the number of buffer chips. Spend the $1,500 savings on a bigger fan and the higher electric bills.
Power8’s 12 CPUs Versus Xeon’s 18 CPUs
The second comparison in Table 2 matches IBM’s biggest Power8 processor and Intel’s biggest Xeon server processor. In this Godzilla versus Rodan battle, the 18-core Xeon E5-2699v3 and the 12-core Power8 are nearly even in the SPECint tests. On SPECfp, Power8 is 11% faster. Note that the Xeon E5-2699v3 has 50% more CPUs (18 versus 12), but Power8 has many more threads (96 versus 36). Another factor in Power8’s favor is that it runs at 3.126GHz versus Xeon’s 2.3GHz—a 36% advantage.
Per socket, these two processors are closely matched in raw integer performance. Per watt, the 145W Xeon is much more power efficient than the 190W Power8, even before adding IBM’s memory-buffer chips. The Power8 processor and its memory buffers will consume 59% to 86% more power.
Pricing is no contest. We estimate that IBM’s 12-core Power8 will list for $2,500; add $180 or $360 for two or four buffer chips. Intel hasn’t published a list price for the Xeon E5-2699v3, but after surveying some Internet resellers, we estimate it lists for about $4,100. As with the previous comparison, customers must balance the substantially lower price of the IBM processor against the additional power and cooling it requires—important considerations for data-center operators.
Bigger Benchmarks Favor Power8
The SPEC CPU benchmarks are low-level measurements of integer and floating-point performance. Although they are useful, real-world application software often makes greater demands on the whole system, especially the memory subsystem. IBM has disclosed some scores from larger benchmark suites that play to Power8’s strengths in memory bandwidth and multithreading.
Even on the SPEC CPU suites, these strengths are telling—Power8’s performance scales at a superlinear rate as its clock frequency rises. Using a nonmerchant 12-core Power8 chip running at 3.5GHz, IBM’s estimated SPECint (875) and SPECfp (685) scores are 30–34% better than those of the 3.126GHz merchant product, even though the clock frequency is only 12% higher. The reason is that the nonmerchant processor provides twice the DRAM bandwidth (410GB/s versus 205GB/s). Nevertheless, the Xeon E5-2699v3 retains its power-efficiency advantage.
But Power8 outperforms Xeon on the larger benchmark suites such as SAP 2-Tier SD (Sales and Distribution database), SPECjbb (Java server), SPECjEnterprise (Java enterprise), and Oracle E-Business Standard R12 Payroll (online transaction processing). Power8’s enormous advantage in peak memory bandwidth tilts the table in its favor when running memory-bound applications. For smaller applications, Xeon looks better.
For these tests, IBM used the nonmerchant 3.5GHz 12-core Power8 in a dual-socket Power S824 server and compared the results with those from a dual-socket system that uses the 18-core Xeon E5-2699v3. On the SAP 2-Tier SD test, the IBM server can handle 21,212 database users versus 16,500 users for the Xeon system, despite having 12 fewer cores. That’s nearly twice the users per CPU (883 versus 458), which is a huge advantage for customers paying per-core licensing fees for their enterprise software. Again, however, the Intel processors are more power efficient, handling 57 users per watt versus IBM’s 36, by our estimates.
Our power-efficiency comparison assumes the dual-socket Power8 system includes eight memory-buffer chips and that the processors consume 12% more power than the 3.126GHz merchant-market models, yielding a total of 586W TDP. That estimate is probably low, because increasing the processors’ clock frequency by 12% almost certainly requires a higher voltage, which would increase power consumption quadratically.
Downshifting the clock speed 12% to the merchant-market maximum for a 12-core Power8 chip would still leave IBM with a big lead in the SAP 2-Tier SD benchmark, but the Xeon E5-2699v3 system would remain more power efficient. In the 20 years or so since the RISC-CISC wars erupted, Intel has done a remarkable job of matching RISC performance while controlling power consumption.
Power8 Duels Ivy Bridge-EP
IBM disclosed additional benchmarks comparing single- and dual-socket Power8 systems with an older dual-socket Intel Xeon E5-2697v2 (Ivy Bridge-EP) system, because comparable scores are unavailable for the newer Xeon E5-2699v3 (Haswell-EP) processor. The Haswell microarchitecture is faster than Ivy Bridge, but not by enough to change the results.
For these enterprise benchmarks, IBM used the same dual-socket Power S824 system from the SAP tests: two 12-core Power8 processors running at 3.5GHz. Server vendors have published scores for various Ivy Bridge systems, including some with dual 12-core Xeon E5-2697v2 processors running at 2.7GHz. In these matchups, the IBM system was 2.6x faster on the SPECjbb2013 tests (167,958 versus 63,079) and was 2.0x faster on the SPECjEnterprise2010 tests (22,543 versus 11,260). Power8’s clock speed was only 1.3x faster, so these results again suggest that its superior multithreading and memory bandwidth were the decisive factors.
For the Oracle EBS R12 Payroll benchmarks, IBM matched a single-socket Power8 system against the same dual-socket Xeon E5-2697v2 system. The Power8 system had one 12-core processor running at 3.5GHz. Despite having only half as many CPUs, it was 7% faster (1,090,909 versus 1,017,639). As mentioned above, customers paying per-core licensing fees for their enterprise software can save money by purchasing a system that has fewer CPUs but doesn’t compromise on performance. In data centers that run their own software, however, this factor doesn’t matter.
Historically, Big Blue’s big iron has been expensive, but our lower price estimates for the company’s merchant-market processors should make Power8 systems more competitive with Intel-based systems. Then, too, Intel can lean on its dominant market share to ask higher prices, whereas IBM is more motivated to negotiate.
For an IBM PowerLinux S824L server with two 12-core Power8 processors running at 3.02GHz, including the operating-system and virtualization licenses, IBM estimates a street price of about $32,000. The same system with two 3.42GHz 10-core Power8 processors would cost about $28,500. Those street prices are in the same ballpark as a Hewlett-Packard DL380P server with two 18-core Xeon-E5v3 (Haswell-EP) processors running at 2.3GHz, which sells for about $30,000.
OpenPower Alliance Gains Steam
So far, no other vendors are shipping Power8 production systems. Tyan is selling a reference system—the “Palmetto” GN70-BP010, a single-socket 2U chassis with a quad-core Power8 processor and a single MBC—for $2,753. To rally more support, IBM is rapidly recruiting companies to join the OpenPower Foundation, which now has more than 80 members. A related consortium, the China Power Technology Alliance, has 30 members.
Thirteen OpenPower members are system or hardware vendors. They include Chuanghe Telco Tech, Tyan, Wistron, and ZTE. To encourage more hardware development, IBM is publishing reference-design specifications for Power8 motherboards, has released Power8-firmware source code on GitHub, and is assembling a field-engineering team to help OEMs. It says 12 to 15 companies have OpenPower systems in development.
Wistron, a Taiwanese OEM, is developing a 2U rack-mount system that supports Power8 processors with 8, 10, or 12 cores. Some Wistron systems will carry the IBM brand. Rackspace, a Texas-based cloud provider, is building an OpenPower system for the Open Compute Project that will run OpenStack, the company’s cloud-services operating system.
In another bold move, IBM has licensed the Power architecture, Power8-related intellectual property, and chip-design tools to a Chinese company, Suzhou PowerCore. Suzhou plans to develop and sell Power-compatible server processors in China. The company is unlikely to design its own Power CPU, so a lower-cost version of Power8 is the probable goal. Another Chinese company, Chuanghe Telco Tech, plans to build Power cloud servers for China Mobile. Additional Chinese companies working on OpenPower projects include Inspur, Teamsun, Unisource, and Zoom Networks.
OpenPower supercomputers are another target. In November, IBM won contracts worth $325 million from the U.S. Department of Energy to build what are described as the world’s most advanced “data-centric” supercomputers for the Lawrence Livermore and Oak Ridge National Laboratories. To support this effort, Nvidia will add its NVLink interconnects to its future Pascal graphics processors, enabling faster communications between Power8 system memory and the GPU’s local memory (see MPR 10/6/14, “Maxwell Illuminates the Masses”). IBM and Nvidia expect the NVLink interconnects to be available in 2016; until then, Power8 processors will connect to the GPUs via PCI Express interfaces.
Although supercomputers are high-visibility design wins, the server market is the money pot. The rapid growth of data centers and cloud servers is consuming millions of processors. Google alone spent nearly $10 billion on infrastructure in 2014, mostly on servers. Google is a founding member of the OpenPower alliance and has prototyped a Power8 system for internal testing, but has not committed to deployment.
IBM’s Third Power Play
The OpenPower Foundation is off to a good start but needs OEM systems in production to offer a credible alternative to Intel-based systems, which dominate the server market. A second source for OpenPower processors would be welcome, too—if nothing else, to assure continued availability.
For years, IBM has steadily moved away from hardware to focus on software and services. The company sold its PC business to Lenovo in 2005 and is currently selling its x86-based server business to the Chinese vendor. In October, IBM agreed to give its fabs to GlobalFoundries along with $1.5 billion to cover the expense of absorbing them. Customers may reasonably wonder if a software-centric IBM will remain committed to the costly business of designing high-performance microprocessors. Although IBM has licensed the architecture and Power8-related IP to a potential second source (Suzhou), the Chinese company probably lacks the resources to design its own Power-compatible CPUs and processors.
Nor is OpenPower the first time IBM has tried to build industrywide support for its Power Architecture. The first attempt was the AIM alliance that united Apple, IBM, and Motorola in 1991 to develop the first PowerPC processors by combining elements of IBM’s server chips with Motorola’s workstation chips (see MPR 7/24/91, “Apple/IBM Deal Catapults RS/6000 to Prominence”). Descendants of those RISC products survive today, mainly as embedded processors under the flag of Motorola spinoff Freescale.
After the AIM alliance fell apart in the early 2000s, IBM formed Power.org in 2004 (see MPR 12/27/04, “Bringing Power to the People”). Although Power.org remains alive and helps IBM guide the architecture’s evolution, OpenPower focuses on high-performance servers, data-center systems, and supercomputers.
Meanwhile, Intel’s x86 architecture has grown to 64 bits and has adapted some RISC concepts and other techniques to improve performance without the higher power consumption that previously handicapped CISC processors. In addition, the company has leapfrogged the industry with superior fabrication technology, such as the first high-volume FinFETs (see MPR 5/23/11, “Intel Sprouts Fins at 22nm”).
IBM Still Flies the RISC Flag
These developments enable today’s x86 server processors to deliver high performance and superb power efficiency. Indeed, x86 is so hard to beat that another rival RISC architecture, SPARC, has all but retreated from the merchant market; Fujitsu and Oracle mainly sell their SPARC processors in their own systems. ARM is posing a new challenge, but it’s initially targeting lower-performance microservers.
IBM can beat Intel’s fastest Xeon E5-2600v3 on application workloads that benefit from high memory bandwidth. Although Xeon may score a little better on some narrow CPU-level benchmarks, those differences are insignificant. In power efficiency, however, the latest Xeon processors remain well ahead of Power8, even on large applications. One of Power8’s handicaps is the additional power (40–80W) required for its memory-buffer chips, but those external controllers enable the 6x greater DRAM bandwidth that makes Power8 so impressive when shouldering heavy workloads.
Another Power8 drawback is that Intel’s OEMs currently offer a much greater variety of systems than IBM and its OpenPower allies. And some of the OEMs that IBM recruits are selling Intel-based systems as well, which will relegate the Power8 systems to a few niche markets. In addition, Power8 customers that aren’t currently using earlier Power-based systems will have to port their software to IBM’s architecture—yet another obstacle that deters Intel customers from defecting.
As OpenPower members bring their new projects to fruition, IBM should be able to make a stronger case that Power Architecture servers, blades, and supercomputers offer a credible alternative to the dominant x86-based systems. As things stand now, Power8 is a viable and cost-effective product for high-end servers, particularly those running memory-intensive workloads.
Even if OpenPower systems never break out of this niche, their high performance can support the high margins that sustain a profitable business. And those margins should remain high as long as Intel keeps its prices and margins high. OpenPower can generate extra revenue that helps IBM defray the costs of designing new Power processors for its own systems and for OEMs. The cost of offering Power processors as merchant products is relatively small, and any incremental profit is good for IBM’s server business.
Price and Availability
IBM’s Power8 server processors are available now as merchant-market products. We estimate that volume prices (1,000 units) range from $850 for a 3.7GHz 8-core processor to $2,500 for a 3.1GHz 12-core processor. For more information, access www.ibm.com/systems/power/index.html.
The first OpenPower Summit is scheduled for March 17–19, 2015, at the San Jose Convention Center. For more information, access openpowerfoundation.org/2015-summit/.