Nvidia recently announced its newest processor, the Grace CPU, a custom Arm-based superchip designed to power the world’s most demanding AI workloads. Nvidia calls it a “superchip” and says it will be the world’s most sophisticated ARM-based processor, with significant performance and efficiency gains over current ARM-based chips.
Let’s take a closer look at the features of this newest offering from Nvidia.
Nvidia describes Arm-based Grace CPU ‘Superchip
Nvidia recently unveiled its new Arm-based Grace CPU, designed to deliver advanced computing power to high performance computing (HPC) applications like artificial intelligence (AI) and deep learning.
The Grace processor will support up to 8-way multiprocessing and hardware virtualization technology, and it can perform up to 10 times faster than existing CPU architectures. It is also capable of powering the most memory-hungry workloads.
The processor will be the first to leverage ARMv8 technology and NVIDIA’s revolutionary NVLink interconnect architecture. This innovative setup allows for quick data transfer between processors, allowing them to rapidly communicate and scale up their performance without sacrificing compute efficiency or scalability. Additionally, Nvidia’s new Ampere GPU architecture can be seamlessly integrated with the Grace CPU for efficient massive-scale computing capabilities, making it an ideal processor for sophisticated workloads such as genomics research, drug discovery and more.
Grace is set to launch next year and promises to revolutionise the HPC industry by providing unprecedented performance levels in AI applications and other computationally intensive tasks. With its combination of innovation and impressive processing power, Nvidia’s upcoming Grace CPU will position itself as one of the world’s leading HPC solutions in the years ahead by delivering reliable performance across various domains.
Background
Nvidia has recently unveiled its new Arm-based Grace CPU, touted as the world’s first AI supercomputer, and it promises to revolutionise data centres with its power, efficiency, and speed.
The Grace CPU is based on the ARMv8 architecture and is designed to be a more powerful alternative to the traditional x86 chips. This article will provide an overview of the Grace CPU, explaining its features and how it can benefit businesses.
Nvidia’s history with Arm-based CPUs
Nvidia has a long history of utilising Arm-based CPUs, beginning with ARM7 processor cores in 2004. Over the years, Nvidia has integrated Arm technology into its offerings including its high performance Tegra series processors and Jetson System on Chip for autonomous machines.
Their latest Arm solution – the Grace CPU – is their most ambitious yet. With an 8nm process and a custom design, this processor offers improved performance over standard ARM CPUs while reducing energy consumption. In addition, this powerful CPU is optimised for training deep learning models with support for CUDA, Tensor cores and RT cores delivering up to 10 petaFLOPS of AI performance.
The Grace CPU is expected to ship in 2023 and will be used in Nvidia’s futuristic DGX SuperPod data centre architecture, combining hundreds of GPUs with 160 Grace CPUs to bring a new level of scalability for data centres looking to move into AI workloads. In addition, Nvidia plans to offer software solutions designed specifically for Their supercomputer clusters powered by the Grace CPU helping make AI deployment easier for enterprise customers.
This marks a milestone in the evolution of processors and signals Nvidia’s commitment to using advanced technology that provides improved efficiency, better performance, and cost-effectiveness for AI workloads going forward.
Nvidia’s previous attempt to enter the CPU market
Nvidia’s history of venturing into the CPU market began in 2013 with the development of the Tegra 4 processor. This ARM-based system-on-chip (SoC) was eventually released in 2014 and targeted low power, portable devices like tablets, smartphones, and embedded systems. The Tegra 4 attempted to “push the boundaries” of graphics performance by featuring 72 individual GPU cores. Unfortunately, the architecture failed to prove competitive due to its high power consumption and weak CPU cores. As a result, Nvidia refocused their efforts on developing GPUs instead of CPUs.
Fast-forwarding to 2021, Nvidia has announced “Grace,” an Arm processor built for AI workloads running in data centres. Powered by NVIDIA technology such as CUDA cores, Tensor cores and NVLink interconnects, Grace is designed to outperform NVIDIA’s current GPU architectures and competitors with impressive performance improvements across various benchmark tests. Moreover, unlike its predecessor Tegra 4 which focused on smaller mobile devices, Grace aims to fill a void between traditional CPUs for general computing tasks and GPUs for graphics processing applications – making it an ideal candidate for powering an array of machine learning workloads.
Features of the Arm-based Grace CPU
Nvidia is set to launch the new Arm-based Grace CPU, a “superchip” for powering its data centres and Artificial Intelligence (AI) applications. Nvidia claims its new processor will be five times faster than CPUs from Intel and up to fifteen times faster than GPUs from AMD.
Let’s explore some of the features of the Arm-based Grace CPU and what makes it a powerful processor.
Design and architecture of the CPU
Nvidia’s Arm-based Grace CPU is a power-efficient, high-performance processor designed to be integrated into embedded systems, such as autonomous robots or other AI-driven applications. The CPU consists of several choice components within its design and architecture which significantly impact the performance that can be achieved with the processor.
Grace’s design is highly modular, meaning components can be interconnected in any configuration to suit individual requirements or applications. This customizability makes Grace an ideal choice for board vendors looking for a processor that meets their requirements.
The foundation of any processor is its core, and Nvidia’s Grace contains many innovative features that set it apart from competitors. Most notably, each core has a dedicated L2 cache memory and an AI processing tensor core. The cores can also execute 18 instructions per cycle, increasing single-threaded performance and allowing for multiple tasks to run in parallel.
Grace is also one of the first processors supporting Arm’s Scalable Vector Extension (SVE), meaning it can leverage vector processing technologies to perform complex calculations quickly and efficiently. Additionally, Nvidia has included advanced energy management features to conserve power when latent capacity isn’t needed. These improvements allow Grace processors to deliver superior performance compared to competing solutions without sacrificing efficiency or scalability.
Performance and power efficiency
The Grace CPU from Nvidia is the first processor to combine the compounding high-performance of Arm technology with Nvidia’s expertise in AI and data centre technologies. In addition, the processor is designed to offer unprecedented levels of performance and power efficiency, making it suitable for the most demanding workloads.
The CPU features up to eight Arm cortex-A78 cores paired with each other and integrated into two clusters, allowing for improved data locality and memory performance while managing thermal power. It also combines 7nm process lithography with industry leading 15MB L3 cache capacity providing extraordinary compute density in server workloads.
Regarding power efficiency, Grace offers output per watt unmatched by any previous CPU architecture. This is achieved by leveraging Nvidia’s patented Max-Q technology that reduces down spin power across a wide range of operating frequencies reducing energy inefficiency from hundreds of advanced cloaking algorithms. Furthermore, advanced sensor fusion algorithms provide the processor with previously unachievable energy efficiency per MHz running up to 800MHz on mobile applications or 3 Ghz on maximum performance settings in data centres.
This unique combination ensures that Grace will deliver higher performance than existing architectures without sacrificing energy efficiency or taking up more space for cooling within racks or other devices where it is installed.
Security features
Nvidia’s new Arm-based Grace CPU offers several advanced security features. This includes support for the latest ARMv8.4 architecture, which brings additional hardware-enforced security capabilities to maintain system integrity and protect data. This includes real-time execution of software countersignatures with secure storage, enhanced support for pointer authentication, sandboxing of high assurance code segments, and per-application attestation.
Additionally, it integrates subsystems that help ensure secure boot and validate trusted components during power on self tests (POST). The chip also boasts virtualization technology that offers greater flexibility with multiple isolation levels between applications and user space processes.
Its many security features provide a more robust platform for running workloads while ensuring that data remains safe even in a malicious attack or intrusion attempt. Furthermore, these features also help prevent buffer overflow attacks by defending against unexpected memory accesses from malicious code execution attempts and protecting against rootkits.
Regarding physical security, Grace incorporates technology such as randomised passkey reinforcement based on data from external sensors, allowing it to detect tampering or unauthorised access attempts. Moreover, its on-die cryptographic algorithms can securely accelerate all popular encryption protocols such as RSA and ECDSA for encrypted transfer of sensitive information such as credit cards and medical records over the Internet or within corporate networks. Finally, users have the option of enabling secure key management systems like dKMS (Device Key Management System), TPM (Trusted Platform Module), or HSMs (Hardware Security Modules). All these features combine to create a highly robust platform capable of securing applications running on computationally intensive tasks like AI and machine learning workflows.
Applications of the Arm-based Grace CPU
Nvidia recently announced their new Arm-based Grace CPU, a powerful ‘superchip’ designed for high performance computing. This new CPU has the potential to be used in a wide range of applications, from scientific computing to artificial intelligence.
In this article, we’ll look at the various applications of the Arm-based Grace CPU and how it could be used to benefit businesses and individuals.
High performance computing
Nvidia’s new Arm-based Grace CPU has been designed for high performance computing. Developed in collaboration with the Swiss National Supercomputing Centre and the US Department of Energy, Grace is a system-on-a-chip (SoC) based on Nvidia’s next-generation Ampere architecture. It delivers a 20x leap in compute performance compared to pre-Arm generations while consuming only half the power by leveraging its innovative power management technologies and an advanced NVIDIA® Tensor Core GPU.
The Grace processor is ideal for running large scale AI workloads including deep learning, natural language processing, and high performance computing. It can run inference, training and data analytics involving massive datasets with higher accuracy and in less time than previous generation CPUs without sacrificing energy efficiency. It can also be optimised for low latency use cases like streaming media or autonomous vehicles. Furthermore, its scalability allows users to use from 10s up to thousands of nodes or clusters even at nanosecond bandwidths making it ideal for large containerized applications such as Kubernetes or Hadoop based workloads.
In addition, Grace has been designed with security features that provide hardened memory protection capabilities and sophisticated cryptography that guard against potential malicious attacks from internal and external threats. Analysts suggest that this makes it a highly secure platform that meets stringent government requirements for confidential data storage.
Artificial intelligence and machine learning
The Grace CPU’s Arm-based architecture and features are well suited for running modern AI and ML workloads. Its large memory and wide vector register files give it an edge over x86 powered CPUs. It also includes dedicated unit components such as a machine independent vector processor engine that increases performance without sacrificing flexibility, scalability and compatibility.
Applications involving Artificial intelligence (AI) and Machine learning (ML) have revolutionised computing over the past few years, resulting in new opportunities in many industries, such as healthcare, finance, and retail. Nvidia’s new Arm-based Grace CPU seeks to leverage this trend by providing powerful performance with built-in support for AI workloads.
Grace offers native support for popular frameworks such as TensorFlow, PyTorch and ONNX and optimised hardware accelerators like NVDLA for execution of deep learning workflows at high speed with low energy consumption. Specifically designed to maximise data throughput between memory layers; Graph Registers provide more cycles per instruction than non-Arm based CPUs and a larger set of registers to store data while processing complex algorithms.
Grace’s heterogeneous compute capabilities allow it to process both AI/ML heavy tasks at high speed and other conventional tasks simultaneously on the same core, significantly reducing resource utilisation costs when running multi workload programs. Moreover its compatibility with NVLink technology helps move high volumes data quickly reducing the time taken for certain processes considerably.
Autonomous driving
Autonomous driving is one of the many applications of Nvidia’s new Arm-based Grace CPU. This type of technology requires high performance with low power for quick inter-process communication in many AI systems, as well as providing accurate and safe operation when controlling motors or other peripheral devices. Grace’s chip can provide exactly that: it combines scalability, power efficiency, safety, and a wide range of specialised AI cores to provide powerful performance suited for this specific application.
With its high throughput I/O, Grace can collect data from the environment directly and process it immediately, enabling fast data processing and machine learning tasks on the fly. In addition, its wide range of built-in cores – such as deep learning tensor processors, image processors, and C++/AI development tools – makes training neural networks simpler and faster.
Furthermore, its two 16 nm ARMv8 cores provide extra compute capacity while consuming around 75 W power (TDP), meeting the automotive industry’s stringent requirements on energy efficiency, which enables more intelligent decisions making by the car autonomously. With these features combined, Nvidia’s Arm-based Grace CPU is positioned to become a key component for autonomous vehicles across all levels of driving complexity.
tags = nvidia grace armbased 1tbtakahashiventurebeat, nvidia, superchip, CPU