Pliops Demonstrates Over 5X Acceleration for LLM Inference

SAN JOSE, Calif., Nov. 13, 2024 (GLOBE NEWSWIRE) -- Addressing the critical issue of constrained power budgets, Pliops is enabling AI-powered businesses and hyperscalers to achieve impressive performance by optimizing power usage, reducing costs, and shrinking their carbon footprint. Next week at SC24, Pliops will spotlight its innovative XDP LightningAI solution, which enables sustainable, high-efficiency AI operations when paired with GPU servers.

Organizations are increasingly concerned about the lack of power budgets in data centers, particularly as AI infrastructure and emerging AI applications lead to higher energy footprints and strain cooling systems. As they scale their AI operations and add GPU compute tiers, the escalating power and cooling demands, coupled with significant capital investments in GPUs, are eroding margins. A monumental challenge looms as data centers struggle to secure essential power, creating significant pressure for companies striving to expand their AI capabilities.

Pliops knows that efficient infrastructure solutions are essential to address these issues – and the company’s newest Extreme Data Processor (XDP), XDP-PRO ASIC – plus a rich AI software stack and distributed XDP LightningAI nodes – address GenAI challenges by utilizing a GPU-initiated Key-Value I/O interface as a foundation, creating a memory tier for GPUs, below HBM. Pliops XDP LightningAI easily connects to GPU servers by leveraging the mature NVMe-oF storage ecosystem to provide a distributed Key-Value service. Pliops has focused on LLM inferencing, a crucial and rapidly evolving area within the GenAI world that demands significant efficiency improvements. The company’s demo at SC24 is centered around accelerating LLM inferencing applications. This same memory tier is seamlessly applicable for other GenAI applications that Pliops plans to introduce over the next few months.

In today's LLM inferencing computing, GPU prefill operations are heavily compute-bound and critically determine the batch size. While prefill can fully utilize GPU resources, increasing the batch size beyond a certain point only increases the Time to First Token (TTFT) without improving prefill rate. On the other hand, GPU decode operations are HBM bandwidth-bound and mainly influenced by model and KV cache sizes, benefiting significantly from larger batch sizes through higher HBM bandwidth efficiency. Pliops' solution improves prefill time, allowing for larger batch sizes without violating user SLA for prefill operations. This enhancement directly affects decode performance as well, as it benefits greatly from the increased batch size. As a result, by improving prefill time, the system achieves nearly proportional improvements in end-to-end throughput.

“By leveraging our state-of-the-art technology, we deliver advanced GenAI and AI solutions that empower organizations to achieve unprecedented performance and efficiency in their AI-driven operations,” said Ido Bukspan, Pliops CEO. “As the industry’s leading HPC technical conference, SC24 is the ideal venue to showcase how our solutions redefine AI infrastructure, enabling faster, more sustainable innovation at scale.”

Highlights at the Pliops booth #1559 on the SC24 show floor of the Georgia World Congress Center include:

Pliops XDP LightningAI running with Dell PowerEdge servers
Pliops XDP enhancements for AI VectorDB

Pliops can also be found at the SC24 PetaFLOP reception at the College Football Hall of Fame on Tuesday, November 19 from 7:00 p.m. to 11:00 p.m. local time.

For more information about Pliops, please visit www.pliops.com.

Connect with Pliops
Read Blog
About Pliops
Visit Resource Center – XDP LightningAI Solution Brief
Connect on LinkedIn
Follow on X

About Pliops
A winner of the FMS 2024 most innovative AI solution, Pliops is a technology innovator focused on making data centers run faster and more efficiently. The company’s Extreme Data Processor (XDP) radically simplifies the way data is processed and managed. Pliops overcomes I/O inefficiencies to massively accelerate performance and dramatically reduce overall infrastructure costs for data-hungry AI applications. Founded in 2017, Pliops has been named a few times one of the 10 hottest semiconductor startups. The company has raised over $200 million to date from leading investors including Koch Disruptive Technologies, State of Mind Ventures Momentum, Intel Capital, Viola Ventures, SoftBank Ventures Asia, Expon Capital, NVIDIA, AMD, Western Digital, SK hynix and Alicorn. For more information, visit www.pliops.com.

Media Contact:
Stephanie Olsen
Lages & Associates
(949) 453-8080
stephanie@lages.com

A photo accompanying this announcement is available at https://www.globenewswire.com/NewsRoom/AttachmentNg/72e84a2c-3a62-448b-9100-096eda890eb5

Pliops Demonstrates Over 5X Acceleration for LLM Inference at SC24

Innovative Solution Significantly Accelerates GPU Transactions, Addresses the Critical Power Budget Issue, and Reduces Carbon Emissions for Hyperscalers and Enterprises

Tags

Pliops Demonstrates Over 5X Acceleration for LLM Inference at SC24

Innovative Solution Significantly Accelerates GPU Transactions, Addresses the Critical Power Budget Issue, and Reduces Carbon Emissions for Hyperscalers and Enterprises

Tags

Weiterführende Links