Samsung, MemVerge, and H3 Build 2TB CXL Memory Pool
by Anton Shilov on August 14, 2023 10:06 AM ESTSamsung, MemVerge, H3 Platform, and XConn have jointly unveiled their 2 TB Pooled CXL Memory System at the Flash Memory Summit. The device can be connected to up to eight hosts, allowing them to use its memory when needed. The 2 TB Pooled CXL Memory system has software enabling it to visualize, pool, tier, and dynamically allocate memory to connected hosts.
The 2 TB Pooled CXL Memory system is a 2U rack-mountable machine built by H3 with eight 256 GB Samsung CXL memory modules connected using XConn's XC50256 CXL 2.0 switch supporting 256 PCIe Gen5 lanes and 32 ports. The firmware of the 2 TB Pooled CXL Memory system allows you to connect it to up to eight hosts that can dynamically use CXL memory when they need it, thanks to software by MemVerge.
The Pooled CXL Memory system was developed to overcome limitations in memory capacity and composability in today's system architecture, which involves tight coupling between CPU and DRAM. Such architecture leads to performance challenges in highly distributed AI/ML applications, such as spilling memory to slow storage, excessive memory copying, I/O to storage, serialization/deserialization, and Out-of-Memory errors that can crash applications.
Attaching 2 TB of fast, low-latency memory using a PCIe 5.0 interface with the CXL 2.0 protocol on top to eight host systems and using it dynamically between eight hosts saves a lot of money while providing loads of performance benefits. According to companies, the initiative represents a significant step towards creating a more robust and flexible memory-centric data infrastructure for modern AI applications.
"Modern AI applications require a new memory-centric data infrastructure that can meet the performance and cost requirements of its data pipeline," said Charles Fan, CEO and co-founder of MemVerge. "Hardware and software vendors in the CXL Community are co-engineering such memory-centric solutions that will deeply impact our future."
The jointly developed demonstration system can be pooled, tiered with main memory, and dynamically provisioned to applications with Memory Machine X software from MemVerge and its elastic memory service. Viewer service showcases the system's physical layout and provides a heat map indicating memory capacity and bandwidth consumption per application.
"The concept system unveiled at Flash Memory Summit is an example of how we are aggressively expanding its usage in next-generation memory architectures," said JS Choi, Vice President of New Business Planning Team at Samsung Electronics. "Samsung will continue to collaborate across the industry to develop and standardize CXL memory solutions, while fostering an increasingly solid ecosystem."
Source: MemVerge
4 Comments
View All Comments
mode_13h - Monday, August 14, 2023 - link
So, the external link is what? A CXL/PCIe x16 cable connector? Do the cables have the same PCIe edge connector as the cards? ReplyTomWomack - Tuesday, August 15, 2023 - link
This seems excessively aggressive and complicated engineering when a single dual-Sapphire-Rapids system can hold 2TB of DDR5 for about £7000 (32 64GB modules at £230) and access it much faster. Increase the capacity by a factor eight and keep the same price per gigabyte and it becomes more interesting. ReplyEletriarnation - Wednesday, August 16, 2023 - link
Still early days so yeah the cost isn't very competitive, but isn't a key differentiator the fact that all of the systems sharing the CXL memory bank get a coherent view of the contents at any given time? Eight servers sharing a common data set in storage would have to independently load that data set into memory and copy any changes back to the common storage, but eight servers on a CXL bank can work directly from the common memory. Still needs some kind of system for locking data that is being written, but it's probably not any more complicated than it would have been when working from disk. Replysjkpublic@gmail.com - Tuesday, August 15, 2023 - link
Another step towards separating memory from processing. Nice. But consider that HBM3 is now doing 1-2 TB/s whereas this solution is probably at least 1/10 of that speed. Maybe a PCIe 5.0 x16 connection can handle the throughput? Reply