Samsung Announces 'Shinebolt' HBM3E Memory: HBM Hits 36GB Stacks at 9.8 Gbps

Original Link: https://www.anandtech.com/show/21104/samsung-announces-shinebolt-hbm3e-memory-hbm-hits-36gb-stacks-at-98-gbps

Samsung Announces 'Shinebolt' HBM3E Memory: HBM Hits 36GB Stacks at 9.8 Gbps

VIEW ARTICLE

by Ryan Smith on October 20, 2023 2:00 PM EST

Posted in
Memory
Samsung
GPUs
HBM
GDDR7
HBM3E

3 Comments

Samsung’s annual Memory Tech Day is taking place in San Jose this morning, and as part of the event, the company is making a couple of notable memory technology announcements/disclosures. The highlight of Samsung’s event is the introduction of Shinebolt, Samsung’s HBM3E memory that will set new marks for both memory bandwidth and memory capacity for high-end processors. The company is also disclosing a bit more on their GDDR7 memory, which will mark a significant technological update to the GDDR family of memory standards.

Starting things off, we have today’s marquee announcement: Shinebolt HBM3E memory. Like the rest of the memory industry, Samsung is preparing a successor to the current-generation of HBM3 memory that’s being used with high-end/HPC-grade processors, with the industry settling around the upcoming HBM3E standard. HBM3E is slated to offer both significantly higher capacities and greater memory bandwidth than HBM3, helping the high-bandwidth memory technology keep up with ever-growing workloads on high-end processors.

Samsung HBM Memory Generations
	HBM3E (Shinebolt)	HBM3 (Icebolt)	HBM2E (Flashbolt)	HBM2 (Aquabolt)
Max Capacity	36GB	24 GB	16 GB	8 GB
Max Bandwidth Per Pin	9.8 Gb/s	6.4 Gb/s	3.6 Gb/s	2.0 Gb/s
Number of DRAM ICs per Stack	12	12	8	8
Effective Bus Width	1024-bit
Voltage	?	1.1 V	1.2 V	1.2 V
Bandwidth per Stack	1.225 TB/s	819.2 GB/s	460.8 GB/s	256 GB/s

The basis of Shinebolt is a new 24Gbit HBM memory die, which Samsung will be producing on their D1a process, the company’s EUV-based 4^th generation 10nm-class (14nm) node. Samsung will be producing both 8Hi and eventually 12Hi stacks based on this new die, allowing for total stack capacities of 24GB and 36GB respectively, 50% more capacity than their HBM3 (Icebolt) equivalents.

According to Samsung, Shinebolt will be able to hit memory clockspeeds as high as 9.8Gbps/pin, better than 50% faster than their HBM3 products. Though given some of Samsung’s previous memory clockspeed claims, there’s a good chance this is a semi-overclocked state. Shinebolt development isn’t far enough along for Samsung to list individual SKUs, but even at the conservative end of things, Samsung is promoting data rates of at least 8Gbps/pin in their event press release. And if Samsung’s ambitious memory frequencies do come to fruition, then that would put Samsung ahead of their competition as well; to date, SK hynix and Micron have announced plans for 9Gbps/pin and 9.2Gbps/pin memory respectively, so Samsung’s claims are certainly the most aggressive.

Overall, these clockspeeds would give a single HBM3E stack a minimum bandwidth of 1TB/sec, and a maximum bandwidth of 1.225TB/sec, well ahead of the 819GB/sec data rate of HBM3. Or to frame things in reference of a high-end processor (e.g. NVIDIA H100), a 6-stack chip would be able to access as much as 216GB of memory with an aggregate memory bandwidth as high as 7.35TB/sec.

As for power efficiency, things look to be a bit of a mixed bag. On a relative basis, Samsung says that Shinebolt will be 10% more efficient than Icebolt – in other words, consuming 10% less power per bit transferred (pJ/bit). However, a 25%+ clockspeed improvement will more than wipe out those gains due to the significant increase in bits transferred. So while Shinebolt will be more efficient overall, on an absolute basis it seems that total power consumption for HBM memory will continue to grow with the next generation.

Either way, for the high-end processor market that Samsung is targeting with Shinebolt, chipmakers are unlikely to be fazed by the power increase. Like the rest of the high-end processor space, Samsung has the AI market set square in its sights – a market segment where both memory bandwidth and memory capacity are limiting factors, especially with massive large language models (LLMs). Along with the traditional supercomputer and networking market segments, Samsung should have little trouble selling faster HBM in the middle of a booming AI market.

Like the other major memory vendors, Samsung expects to ship Shinebolt at some point in 2024. Given that the company just started sampling the memory – and that HBM3 Icebolt itself just hit mass production – Shinebolt’s likely not shipping until the later part of the year.

A Brief Teaser on HBM4: FinFETs & Copper-to-Copper Bonding

Finally, looking even farther into the future, Samsung is briefly talking about their plans for HBM4 memory. While that technology is still a few years off (there’s not even an approved specification for it yet), we know from previous disclosures that the memory industry is aiming to move to a wider, 2048-bit memory interface. Which, as Samsung likes to frame things, is the only practical choice when further HBM clockspeed increases would blow out power consumption.

For HBM4, Samsung is looking at employing more advanced fab and packaging technologies that are currently the domain of logic chips. On the fab side of matters, the company wants to move to using FinFET transistors for their memory, as opposed to the planar transistors still used there. As with logic, FinFETs would reduce the drive current required, which would help to improve DRAM energy efficiency. Meanwhile on the packaging side of matters, Samsung is looking at moving from micro-bump bonding to bumpless (direct copper-to-copper) bonding, a packing technique that’s still on the cutting-edge of development even in the logic space. Embracing cutting-edge technologies will be critical to keeping HBM bandwidth growing as it has over the last decade, but the costs and complexities of doing so also underscore why HBM remains an exclusively niche high-end memory technology.

GDDR7 Update: 50% Lower Stand-By Power Than GDDR6

Besides HBM3E, Samsung’s other big bandwidth memory update of the day is a brief status update on their GDDR7 memory.

Back in July of this year, Samsung announced that they completed initial development on their GDDR7 memory. The next generation of GDDR memory, GDDR7 brings with it several major changes versus today’s GDDR6, the most significant of which is a switch to PAM3 encoding. PAM3 allows for 1.5 bits to be transferred per cycle (or rather 3 bits over two cycles), opening the door to improving memory transfer rates without employing more costly means of further improving the frequency of the memory bus.

GDDR Memory Generations
	GDDR7	GDDR6X	GDDR6
B/W Per Pin	32 Gbps (Projected)	24 Gbps (Shipping)	24 Gbps (Sampling)
Chip Density	2 GB (16 Gb)	2 GB (16 Gb)	2 GB (16 Gb)
Total B/W (256-bit bus)	1024 GB/sec	768 GB/ssec	768 GB/ssec
DRAM Voltage	1.2 V	1.35 V	1.35 V
Data Rate	QDR	QDR	QDR
Signaling	PAM-3	PAM-4	NRZ (Binary)
Packaging	266 FBGA	180 FBGA	180 FBGA

As a quick recap from Samsung’s July announcement, Samsung will be rolling out 16Gbit (2GB) modules, which will be able to run at up to 32Gbps/pin. That’s a 33% improvement in bandwidth per pin over current GDDR6 memory, and would bring the aggregate bandwidth of a 256-bit memory bus to a cool 1TB/second. GDDR7 should also deliver a 20% improvement in power efficiency over Samsung’s GDDR6 (in terms of pJ/bit), thanks in part to the use of Samsung’s 3^rd generation D1z (10nm-class) fab node.

Today’s event from Samsung is largely a recap of July’s announcement, but in the process we have learned a couple of new technical details on GDDR7 that Samsung hasn’t previously disclosed. First off, GDDR7 isn’t just improving active power consumption, but the tech will also improve on stand-by power consumption to a significant degree. Thanks to additional clock controls, GDDR7 will consume 50% less stand-by power than GDDR6.

Second, in discussing why Samsung (and the industry as a whole) went with PAM3 encoding for GDDR7 instead of even denser PAM4, the company confirmed some of our technical suppositions on the new technology. In short, PAM3 has a lower average bit error rate (BER) than PAM4, largely thanks to the wider margins on the eye window. None of which makes PAM4 unworkable (as Micron has already proven), but Samsung and the rest of the memory industry are favoring the relative simplicity of PAM3, given the trade-offs.

Besides the usual video card/gaming customers, Samsung is expecting GDDR7 to be adopted by AI chip makers, and perhaps a bit more surprisingly, the automotive industry. In fact some of these non-traditional customers may be the first to adopt the memory; since the traditional GPU vendors are still mid-cycle on their current generation of products, it will still be quite some time before they ship any GDDR7-capable silicon.

At this point Samsung has not announced a projected date for when their GDDR7 memory will go into mass production. But the company is still expecting that they will be the first vendor to ship the next-generation memory, presumably in 2024.

Source: Samsung