Intel Demos Sapphire Rapids Hardware Accelerator Blocks In Action At Innovation 2022 (2024)

With Intel’s annual Innovation event taking place this week in San Jose, the company is looking to recapture a lot of technical momentum that has slowly been lost over the past couple of years. While Intel has remained hard at work releasing new products over the time, the combination of schedule slips and an inability to show off their wares to in-person audiences has taken some of the luster off the company and its products. So for their biggest in-person technical event since prior to the pandemic, the company is showing off as much silicon as they can, to convince press, partners, and customers alike that CEO Pat Gelsinger’s efforts have put the company back on track.

Of all of Intel’s struggles over the past couple of years, there is no better poster child than their Sapphire Rapids server/workstation CPU. A true next-generation product from Intel that brings everything from PCIe 5 and DDR5 to CXL and a slew of hardware accelerators, there’s really nothing to write about Sapphire Rapids’ delays that hasn’t already been said – it’s going to end up over a year late.

But Sapphire Rapids is coming. And Intel is finally able to see the light at the end of the tunnel on those development efforts. With general availability slated for Q1 of 2023, just over a quarter from now, Intel is finally in a position to show off Sapphire Rapids to a wider audience – or at least, members of the press. Or to take a more pragmatic read on matters, Intel now needs to start seriously promoting Sapphire Rapids ahead of its launch, and that of its competition.

For this year’s show, Intel invited members of the press to see a live demo of pre-production Sapphire Rapids silicon in action. The purpose of the demos, besides to give the press the ability to say “we saw it; it exists!” is to start showing off one of the more unique features of Sapphire Rapids: its collection of dedicated accelerator blocks.

Along with delivering a much-needed update to the CPU’s processor cores, Sapphire Rapids is also adding/integrating dedicated accelerator blocks for several common CPU-critical server/workstation workloads. The idea, simply put, is that fixed function silicon can do the task as quickly or better than CPU cores for a fraction of the power, and for only a fractional increase in die size. And with hyperscalers and other server operators looking for big improvements in compute density and energy efficiency, domain specific accelerators such as these are a good way for Intel to deliver that kind of edge to their customers. And it doesn’t hurt either that rival AMD isn’t expected to have similar accelerator blocks.

A Quick Look At Sapphire Rapids Silicon

Before we get any further, here’s a very quick look at the Sapphire Rapids silicon.

For their demos (and eventual reviewer use), Intel has assembled some dual socket Sapphire Rapids systems using pre-production silicon. And for photo purposes, they’ve popped open one system and popped out the CPU.

There’s not much we can say about the silicon at this point beyond the fact that it works. Since it’s still pre-production, Intel isn’t disclosing clockspeeds or model numbers – or what errata has resulted in it being non-final silicon. But what we do know is that these chips have 60 CPU cores up and running, as well as the accelerator blocks that were the subject of today’s demonstrations.

Sapphire Rapids’ Accelerators: AMX, DLB, DSA, IAA, and AMX

Not counting the AVX-512 units on the Sapphire Rapids CPU cores, the server CPUs will be shipping with 4 dedicated accelerators within each CPU tile.

These are Intel Dynamic Load Balancer (DLB), Intel Data Streaming Accelerator (DSA), Intel In-Memory Analytics Accelerator (IAA), and Intel QuickAssist Technology (QAT). All of these hang off of the chip mesh as dedicated devices, and essentially function as PCIe accelerators that have been integrated into the CPU silicon itself. This means the accelerators don’t consume CPU core resources (memory and I/O are another matter), but it also means the number of accelerator cores available doesn’t directly scale up with the number of CPU cores.

Of these, everything but QAT is new to Intel. QAT is the exception as the previous generation of that technology was implemented in the PCH (chipset) used for 3rd generation Xeon (Ice Lake-SP) processors, and as of Sapphire Rapids is being integrated into the CPU silicon itself. Consequently, while Intel implementing domain specific accelerators is not a new phenomena, the company is going all-out on the idea for Sapphire Rapids.

All of these dedicated accelerator blocks are designed to offload a specific set of high-throughput workloads. DSA, for example, accelerates data copies and simple computations such as calculating CRC32s. Meanwhile QAT is a crypto acceleration block as well as a data compression/decompression block. And IAA is similar, offing on-the-fly data compression and decompression to allow for large databases (i.e. Big Data) to be held in memory in a compressed form. Finally, DLB, which Intel did not demo today, is a block for accelerating load balancing between servers.

Finally, there is Advanced Matrix Extension (AMX), Intel’s previously-announced matrix math execution block. Similar to tensor cores and other types of matrix accelerators, these are ultra-high-density blocks for efficiently executing matrix math. And unlike the other accelerator types, AMX isn’t a dedicated accelerator, rather it’s a part of the CPU cores, with each core getting a block.

AMX is Intel’s play for the deep learning market, going above and beyond the throughput they can achieve today with AVX-512 by using even denser data structures. While Intel will have GPUs that go beyond even this, for Sapphire Rapids Intel is looking to address the customer segment that needs AI inference taking place very close to CPU cores, rather than in a less flexible, more dedicated accelerator.

The Demos

For today’s press demo, Intel brought out their test team to setup and showcase series of real-world demos that leverage the new accelerators and can be benchmarked to showcase their performance. For this Intel was looking to demonstrate the advantages over both unaccelerated (CPU) operation on their own Sapphire Rapids hardware – i.e. why you should use their accelerators in these style of workloads – as well as to showcase the performance advantage versus executing the same workloads on arch rival AMD’s EPYC (Milan) CPUs.

Intel, of course, has already run the data internally. So the purpose of these demos was, besides revealing these performance numbers, to showcase that the numbers were real and how they were getting them. Make no mistake, this is Intel wanting to put its best foot forward. But it is doing so with real silicon and real servers, in workloads that (to me) seem like reasonable tasks for the test.

QuickAssist Technology Demo

First up was a demo for the QuickAssist Technology(QAT) accelerator. Intel started with a NGINX workload, measuring OpenSSL crypto performance.

Aiming for roughly iso-performance, Intel was able to achieve roughly 66K connections per second on their Sapphire Rapids server, using just the QAT accelerator and 11 of the 120 (2x60) CPU cores to handle the non-accelerated bits of the demo. This compares to needing 67 cores to achieve the same throughput on Sapphire Rapids without any kind of QAT acceleration, and 67 cores on a dual socket EPYC 7763 server.

The second QAT demo was measuring compression/decompression performance on the same hardware. As you’d expect for a dedicated accelerator block, this benchmark was a blow-out. The QAT hardware accelerator blew past the CPUs, even coming in ahead of them when they used Intel’s highly optimized ISA-L library. Meanwhile this was an almost entirely-offloaded task, so it was consuming 4 CPU cores’ time versus all 120/128 CPU cores in the software workloads.

In-Memory Analytics Accelerator Demo

The second demo was of the In-Memory Analytics Accelerator. Which, despite the name, doesn’t actually accelerate the actual analyzing portion of the task. Rather it’s a compression/decompression accelerator primed for use with databases so that they can be operated on in memory without a massive CPU performance cost.

Running the demo on a ClickHouse DB, this scenario demonstrated the Sapphire Rapids system seeing a 59% queries-per-second performance advantage versus an AMD EPYC system (Intel did not run a software-only Intel setup), as well as reduced memory bandwidth usage and reduced memory usage overall.

The second IAA demo was a set against RocksDB with the same Intel and AMD systems. Once again Intel demonstrated the IAA-accelerated SPR system coming out well ahead, with 1.9x higher performance and nearly half-lower latency.

Advanced Matrix Extensions Demo

The final demo station Intel had setup was configured for showcasing Advanced Matrix Extensions (AMX) and the Data Streaming Accelerator (DSA).

Starting with AMX, Intel ran an image classification benchmark using TensorFlow and the ResNet50 neural network. This test used unaccelerated FP32 operations on the CPUs, AVX-512 accelerated INT8 on Sapphire Rapids, and finally AMX-accelerated INT8 also on Sapphire Rapids.

This was another blow-out for the accelerators. Thanks to the AMX blocks on the CPU cores, the Sapphire Rapids system delivered just under a 2x performance increase over AVX-512 VNNI mode with a batch size of 1, and over 2x with a batch size of 16. And, of course, the scenario looks even more favorable for Intel compared to the EPYC CPUs since the current Milan processors don’t offer AVX-512 VNNI. The overall performance gains here aren’t as great as going from pure CPU to AVX-512, but then AVX-512 was already part-way to being a matrix acceleration block on its own (among other things).

Data Streaming Accelerator Demo

Finally, Intel demoed the Data Streaming Accelerator (DSA) block, which is back to showcasing dedicated accelerator blocks on Sapphire Rapids. In this test, Intel setup a network transfer demo using FIO to have a client read data from a Sapphire Rapids server. DSA is used here to offload the CRC32 calculations used for the TCP packets, an operation that adds up quickly in terms of CPU requirements at the very high data rates Intel was testing – a 2x100GbE connection.

Using a single CPU core here to showcase efficiency (and because a few CPU cores would be enough to saturate the link), the DSA block allowed Sapphire Rapids to deliver 76% more IOPS on a 128K QD64 sequential read as compared to just using Intel’s optimized ISA-L library on the same workload. The lead over the EPYC system was even greater, and the latency with DSA was brought well under 2000us.

A similar test was also done with a smaller 16K QD256 random read, running against 2 CPU cores. The performance advantage for DSA was not as great here – just 22% versus optimized software on Sapphire Rapids – but again the advantage over EPYC was greater, and latencies were lower.

First Thoughts

And there you have it: the first press demo of the dedicated accelerator blocks (and AMX) on Intel’s 4th Generation Xeon (Sapphire Rapids) CPU. We saw it, it exists, and it's the tip of the iceberg for everything that Sapphire Rapids is slated to bring to customers starting next year.

Given the nature of and the purpose for domain specific accelerators, there’s nothing here that I feel should come as a great surprise to regular technical readers. DSAs exist precisely to accelerate specialized workloads, particularly those that would otherwise be CPU and/or energy intensive, and that’s what Intel has done here. And with the competition in the server market expected to be a hot one for general CPU performance, these accelerator blocks are a way for Intel to add further value to their Xeon processors, as well as stand out from AMD and other rivals that are pushing even larger numbers of CPU cores.

Expect to see more on Sapphire Rapids over the coming months, as Intel gets closer to finally shipping their next-generation server CPU.

Gallery:

Intel Demos Sapphire Rapids Hardware Accelerator Blocks In Action At Innovation 2022 (2024)

FAQs

Intel Demos Sapphire Rapids Hardware Accelerator Blocks In Action At Innovation 2022? ›

Intel Demos Sapphire Rapids Hardware Accelerator Blocks In Action At Innovation 2022. With Intel's annual Innovation event taking place this week in San Jose, the company is looking to recapture a lot of technical momentum that has slowly been lost over...

What accelerators are in Sapphire Rapids? ›

Sapphire Rapids-SP (Scalable Performance)

+: Includes 1 of each of the four accelerators: DSA, IAA, QAT, DLB.

What is Intel Sapphire Rapids? ›

With the launch of the 4th generation of Intel Xeon Scalable processors and the Xeon CPU Max series, which were codenamed Sapphire Rapids, Intel is charting a new server architecture path for the future that will have a significant impact on Intel and the rest of the industry.

Is Sapphire Rapids good? ›

It has also been discussed, that the new Sapphire Rapids CPU is doing much better – that is, the clock speed is not lowered that drastically, or not at all, which can be seen in achieving 80% of the theoretical peak, as compared to 65% on the Ice Lake. The new AMD CPUs all achieve above 100% theoretical peak.

What is the next generation of Sapphire Rapids? ›

The follow-up to Intel's 4th Gen Sapphire Rapids CPU lineup comes in the form of the 5th Gen Xeon family, codenamed Emerald Rapids. The Emerald Rapids-SP CPUs are already sampling and are on schedule to deliver in Q4 2023. These chips will offer higher-quality silicon with volume validation in progress.

How much will Sapphire Rapids cost? ›

The Sapphire Rapids processors span from eight-core models to 60 cores, with pricing beginning at $415 and peaking at $17,000 for the flagship Xeon Scalable Platinum 8490H.

When did Sapphire Rapids launch? ›

​On Jan. 10, 2023, Intel officially launched its 4th Gen Intel® Xeon® Scalable processors (code-named Sapphire Rapids) for data center customers around the globe.

How many transistors are in the Intel Sapphire Rapids? ›

Previously known by the code name Ponte Vecchio, these GPUs pack over 100 billion transistors, and can support up to 128GB of HBM2e memory, delivering up to 52 teraflops peak FP64 performance.

What is the lineup of Sapphire Rapids processors? ›

Products formerly Sapphire Rapids
Product NameLaunch DateTotal Cores
Intel® Xeon® w3-2425 Processor (15M Cache, 3.00 GHz)Q1'236
Intel® Xeon® w7-3455 Processor (67.5M Cache, 2.50 GHz)Q1'2324
Intel® Xeon® w3-2435 Processor (22.5M Cache, 3.10 GHz)Q1'238
Intel® Xeon® w5-2445 Processor (26.25M Cache, 3.10 GHz)Q1'2310
55 more rows

How many memory channels does Sapphire Rapids have? ›

It provides eight channels of DDR5 per CPU with a maximum memory speed of 4800 MHz. Compared to the 3rd generation, it results in up to 50% more aggregated bandwidth as the Ice Lake generation supports eight channels using DDR4 3200 MHz.

Which is better, the i9 or the Xeon? ›

In contrast with the Intel Xeon line of processors, the Core i9 processors exhibit a higher maximum speed, and come with integrated graphics making them ideal for gaming and playing 4K videos. With a faster clock speed, CPU calculations are performed more quickly which facilitates smoother application performance.

How big is the cache on Sapphire Rapids? ›

There are a number of improvements throughout the core, but one of the other major differences are the caches. Intel is moving from 1.25MB L2 cache per core to 2MB, and then from 1.5MB of L3 to 1.875MB of L3 per core. Some parts may have more cache than cores.

Is Threadripper better than Xeon? ›

The 32-core Threadripper 7975WX is price comparable to the Xeon w9-3475X 36-core, though the Xeon has 4 more cores. Not only did the Threadripper outperform the similarly priced Xeon by a wide margin, but it is even faster than the top-of-the-line 56-core w9-3495X.

How many watts is the Intel Sapphire Rapids? ›

The Sapphire Rapids family includes 52 SKUs (see chart) grouped across 10 segments, inclusive of the Max series: 11 are optimized for 2-socket performance (8 to 56 cores, 150-350 watts), 7 for 2-socket mainline performance (12 to 36 cores, 150-300 watts), 10 target four- and eight- socket (8 to 60 cores, 195-350 watts) ...

How fast is Sapphire Rapids DDR5? ›

In addition to the increased cache memory, it now supports the latest DDR5 memory. While the maximum memory transfer rate of the previous generation was 3,200 MT/s, the new Sapphire Rapids generation supports 4,800 MT/s. With these improvements, the theoretical memory bandwidth reaches up to 307 GB/s per processor.

Who is the successor of Intel Ice Lake? ›

Ice Lake-SP was succeeded by Sapphire Rapids, powered by Golden Cove cores.

What is the difference between Sapphire Rapids and emerald rapids? ›

Emerald Rapids includes 5600MT/s of DDR5 memory bandwidth, an improvement from 4800MT/s in Sapphire Rapids. The chip will also make possible for the first time some real world implementations of the Compute Express Link 1.1 interface (CXL), Singhal said.

What is Intel Rapid Storage and do I need it? ›

What can Intel® Rapid Storage Technology (Intel® RST) give you? Intel® RST offers you new levels of protection, performance, and expandability for desktop and mobile platforms. Whether using one or multiple hard drives, you can take advantage of enhanced performance and power consumption that is lower.

Is Intel Sapphire Rapids 7nm? ›

More About Intel's New 7nm Chip

This chip will be the successor of the previous Sapphire Rapids 10nm chip, offering Optane DC DIMMs and greater overall performance.

References

Top Articles
InsideTulsaSports: Tulsa Golden Hurricane Football & Basketball Recruiting - Tulsa dominates Northwestern State, 62-28, in season opener
Rivals.com: Rivals Football & Basketball Recruiting - Our 2024 comprehensive Syracuse Football preview
Skyward Houston County
Splunk Stats Count By Hour
I Make $36,000 a Year, How Much House Can I Afford | SoFi
Blanchard St Denis Funeral Home Obituaries
South Carolina defeats Caitlin Clark and Iowa to win national championship and complete perfect season
Computer Repair Tryon North Carolina
Craigslist - Pets for Sale or Adoption in Zeeland, MI
Clafi Arab
No Credit Check Apartments In West Palm Beach Fl
Cvs Learnet Modules
今月のSpotify Japanese Hip Hopベスト作品 -2024/08-|K.EG
Robert Malone é o inventor da vacina mRNA e está certo sobre vacinação de crianças #boato
Valentina Gonzalez Leak
7543460065
Operation Cleanup Schedule Fresno Ca
Pricelinerewardsvisa Com Activate
Spergo Net Worth 2022
Unterwegs im autonomen Freightliner Cascadia: Finger weg, jetzt fahre ich!
TBM 910 | Turboprop Aircraft - DAHER TBM 960, TBM 910
Water Trends Inferno Pool Cleaner
Contracts for May 28, 2020
Ice Dodo Unblocked 76
Seeking Arrangements Boston
Atlases, Cartography, Asia (Collection Dr. Dupuis), Arch…
The Many Faces of the Craigslist Killer
Craigslist Illinois Springfield
Valic Eremit
Obituaries Milwaukee Journal Sentinel
Prey For The Devil Showtimes Near Ontario Luxe Reel Theatre
Catchvideo Chrome Extension
Nearest Ups Ground Drop Off
Scott Surratt Salary
WPoS's Content - Page 34
Account Now Login In
Six Flags Employee Pay Stubs
Where Can I Cash A Huntington National Bank Check
What Happened To Father Anthony Mary Ewtn
Puretalkusa.com/Amac
Hattie Bartons Brownie Recipe
Junior / medior handhaver openbare ruimte (BOA) - Gemeente Leiden
Movies123.Pick
Reborn Rich Ep 12 Eng Sub
Rage Of Harrogath Bugged
PruittHealth hiring Certified Nursing Assistant - Third Shift in Augusta, GA | LinkedIn
Firestone Batteries Prices
814-747-6702
De boeken van Val McDermid op volgorde
Vcuapi
Inside the Bestselling Medical Mystery 'Hidden Valley Road'
Leslie's Pool Supply Redding California
Latest Posts
Article information

Author: Margart Wisoky

Last Updated:

Views: 6338

Rating: 4.8 / 5 (58 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Margart Wisoky

Birthday: 1993-05-13

Address: 2113 Abernathy Knoll, New Tamerafurt, CT 66893-2169

Phone: +25815234346805

Job: Central Developer

Hobby: Machining, Pottery, Rafting, Cosplaying, Jogging, Taekwondo, Scouting

Introduction: My name is Margart Wisoky, I am a gorgeous, shiny, successful, beautiful, adventurous, excited, pleasant person who loves writing and wants to share my knowledge and understanding with you.