Nvidia triples and Intel doubles generative AI inference performance on new MLPerf benchmark Nvidia triples and Intel doubles generative AI inference performance on new MLPerf benchmark
Join us in Atlanta on April 10th and explore the landscape of security workforce. We will explore the vision, benefits, and use cases of... Nvidia triples and Intel doubles generative AI inference performance on new MLPerf benchmark

Join us in Atlanta on April 10th and explore the landscape of security workforce. We will explore the vision, benefits, and use cases of AI for security teams. Request an invite here.

MLCommons is out today with its MLPerf 4.0 benchmarks for inference, once again showing the relentless pace of software and hardware improvements.

As generative AI continues to develop and gain adoption, there is a clear need for a vendor-neutral set of performance benchmarks, which is what MLCommons provides with the MLPerf set of benchmarks. There are multiple MLPerf benchmarks with training and inference being among the most useful. The new MLPerf 4.0 Inference results are the first update on inference benchmarks since the MLPerf 3.1 results were released in September 2023. 

Needless to say, a lot has happened in the AI world over the last six months, and the big hardware vendors including Nvidia and Intel have been busy improving both hardware and software to further optimize inference.  The MLPerf 4.0 inference results show marked improvements for both Nvidia and Intel’s technologies.

The MLPerf inference benchmark has also changed. With the MLPerf 3.1 benchmark large language models (LLMs) were included with the GPT-J 6B (billion) parameter model to perform text summarization. With the new MLPerf 4.0 benchmark the popular Llama 2 70 billion parameter open model is being benchmarked for question and answer (Q&A). MLPerf 4 also for the first time includes a benchmark for gen AI image generation with Stable Diffusion.

VB Event

The AI Impact Tour – Atlanta

Continuing our tour, we’re headed to Atlanta for the AI Impact Tour stop on April 10th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on how generative AI is transforming the security workforce. Space is limited, so request an invite today.

Request an invite

“MLPerf is really sort of the industry standard benchmark for helping to improve speed efficiency and accuracy for AI,” MLCommons Founder and Executive Director David Kanter said in a press briefing.

Why AI benchmarks matter

There are more than 8,500 performance results in the MLCommons’ latest benchmark, testing all manner of combinations and permutations of hardware, software and AI inference use cases. Kanter emphasized that there is a real purpose to the MLPerf benchmarking process.

“To remind people of the principle behind benchmarks. really the goal is to set up good metrics for the performance of AI,” he said. “The whole point is that once we can measure these things, we can start improving them.”

With MLCommons another goal is to help align the whole industry together. The benchmark results are all conducted on tests with similar datasets and configuration parameters across different hardware and software. The results are seen by all the submitters to a given test, such that if there are any questions from a different submitter, they can be addressed. 

Ultimately the standardized approach to measuring AI performance is about enabling enterprises to make informed decisions.

“This is helping to inform buyers, helping them make decisions and understand how systems, whether they’re on premises systems, cloud systems or embedded systems, perform on relevant workloads,” Kanter said. “If you’re looking to buy a system to run large language model inference, you can use benchmarks to help guide you, for what those systems should look like.”

Nvidia triples AI inference performance, with the same hardware

Once again, Nvidia dominates the MLPerf benchmarks with a series of impressive results.

While it’s to be expected that new hardware would yield better performance, Nvidia is also able to get better performance out of its existing hardware. Using Nvidia’s TensorRT-LLM open-source inference technology, Nvidia was able to nearly triple the inference performance for text summarization with the GPT-J LLM on its  H100 Hopper GPU.

In a briefing with press and analysts, Dave Salvator, director of accelerated computing products at Nvidia emphasized that the performance boost has occurred in only six months.

“We’ve gone in and been able to triple the amount of performance that we’re seeing and we’re very, very pleased with this result,” Salvator said. “Our engineering team just continues to do great work to find ways to extract more performance from the Hopper architecture.”

Nvidia just announced its newest generation Blackwell GPU last week at GTC, which is the successor to the Hopper architecture. In response to a question from VentureBeat, Salvator said he wasn’t sure exactly when Blackwell-based GPUs would be benchmarked for MLPerf, but he hoped it would be as soon as possible.

Even before Blackwell is benchmarked, the MLPerf 4.0 results mark the debut of H200 GPU results which further improve on the H100’s inference capabilities The H200 results are up to 45% faster than the H100 when evaluated using Llama 2 for inference.

Intel reminds industry that CPUs still matter for inference too

Intel is also a very active participant in the MLPerf 4.0 benchmarks with both its Habana AI accelerator and Xeon CPU technologies.

With Gaudi, Intel’s actual performance results trail the Nvidia H100 though the company claims it offers better price per performance. What is perhaps more interesting are the impressive gains coming from the 5th Gen Intel Xeon processor for inference.

In a briefing with press and analysts, Ronak Shah, AI product director for Xeon at Intel commented that the 5th Gen Intel Xeon was 1.42 times faster for inference than the previous 4th Gen Intel Xeon across a range of MLPerf categories. Looking specifically at just the GPT-J LLM text summarization use case, the 5th Gen Xeon was up to 1.9 times faster.

“We recognize that for many enterprise customers that are deploying their AI solutions, they’re going to be doing it in a mixed general purpose and AI environment,” Shah said. “So we designed CPUs that mesh together, strong general purpose capabilities with strong AI capabilities with our AMX engine.”

Source link