How to maximize AI performance in the cloud and on edge devices

HOME
How to maximize AI performance in the cloud and on edge devices

文字のサイズ

How to maximize AI performance in the cloud and on edge devices

Until now, maximizing the performance of cloud and edge devices with AI has required specialized engineering and vast resources.
Against this backdrop, Arm announced the integration of Arm Kleidi technology with PyTorch and ExecuTorch to enable next-generation applications to run large-scale language models (LLMs) on Arm CPUs.
This extends the benefits of AI performance from the edge to the cloud, enabling the creation of next-generation applications capable of running large-scale language models on Arm CPUs.

Developers will be able to adopt the latest generative AI models across the stack and immediately take advantage of significantly improved inference performance.
Collaboration with cloud service providers and leading companies in the ML ISV community will provide additional support for AI developers.
Confirmed results include the first real-time chat response in mainline PyTorch with Arm’s demo chatbot running on an AWS Graviton processor with Meta Llama 3 large-scale language models.

In this example, the time to first token was 2.5 times faster as measured by AWS Graviton4 after integrating Kleidi technology into the open source PyTorch codebase.
Learn more about new technologies aimed at maximizing AI performance in the cloud and on edge devices, and how they are being used in the field.