Untether AI Unveils Its Second-Generation At-Memory Compute Architecture at HOT CHIPS 2022

Distributed by Business Wire
23rd August 2022

speedAI architecture delivers 2 PetaFlops of performance at 30 TeraFlops/W of energy efficiency

PALO ALTO, Calif.–(BUSINESS WIRE)–Untether AI^TM, the leader in at-memory computation for artificial intelligence (AI) workloads, today announced at the HOT CHIPS 2022 conference its next-generation architecture for accelerating AI inference workloads called speedAI devices, with an internal codename “Boqueria.” At 30 TeraFlops per watt (TFlops/W) and 2 PetaFlops of performance, the speedAI architecture sets a new standard for energy efficiency and compute density.

Challenges of AI Inference Acceleration

AI is increasingly being deployed in a variety of markets, from financial technology, smart city and retail, natural language processing, autonomous vehicles, and scientific applications. There has been an explosion in the types of neural network architectures as well as compute demand, resulting in increased energy consumption for AI workloads. These demanding applications require increasing levels of accuracy to ensure safety and quality of results. These requirements of flexibility, performance combined with energy efficiency, and accuracy necessitate a new approach to AI acceleration which Untether AI delivers with its speedAI devices.

“The merits of at-memory compute have been proven with the first generation runAI device, and the second generation speedAI architecture enhances the energy efficiency, throughput, accuracy, and scalability of our offering,” said Arun Iyengar, CEO of Untether AI. “speedAI devices offer an ability that is unmatched by any other inference offering in the marketplace.”

Energy efficiency drives performance

Because at-memory compute is significantly more energy efficient than traditional von Neumann architectures, more TFlops can be performed for a given power envelope. With the introduction of the runAI devices in 2020, Untether AI set a new energy efficiency level at 8 TOPs/W for the INT8 datatype. The speedAI architecture dramatically improves upon that, delivering 30 TFlops/W. This energy efficiency is a product of the second-generation at-memory compute architecture, over 1,400 optimized RISC-V processors with custom instructions, energy efficient dataflow, and the adoption of a new FP8 datatype, all of which helps quadruple efficiency compared to the previous generation runAI device. The first member of the family, the speedAI240 device provides 2 PetaFlops of FP8 performance and 1 PetaFlop of BF16 performance. This translates into industry leading performance and efficiency on neural networks like BERT-base, which speedAI240 can run at over 750 queries per second per watt (qps/w), 15x greater than the current state of the art from leading GPUs.

Second-generation memory bank: Designed for flexible, efficient AI acceleration

Each memory bank of the speedAI architecture has 512 processing elements with direct attachment to dedicated SRAM. These processing elements support INT4, FP8, INT8, and BF16 datatypes, along with zero-detect circuitry for energy conservation and support for 2:1 structured sparsity. Arranged in 8 rows of 64 processing elements, each row has its own dedicated row controller and hardwired reduce functionality to allow flexibility in programming and efficient computation of transformer network functions such as Softmax and LayerNorm. The rows are managed by two RISC-V processors with over 20 custom instructions designed for inference acceleration. The flexibility of the memory bank allows it to adapt to a variety of neural network architectures, including convolutional, transformer, and recommendation networks as well as linear algebra models

FP8: The new datatype for accurate inference acceleration

In the search for energy efficiency Untether AI’s research determined that two different FP8 formats provided the best mix of precision, range, and efficiency. A 4-mantissa version (FP8p for “precision”) and a 3-mantissa version (FP8r for “range”) provided the best accuracy and throughput for inference across a variety of different networks. For both convolutional networks like ResNet-50 and transformer networks like BERT-Base, Untether AI’s implementation of FP8 results in less than 1/10th of 1 percent of accuracy loss compared to using BF16 data types, with a fourfold increase in throughput and energy efficiency.

Scalability for large language models

The speedAI240 device is designed to scale to large models. The memory architecture is multi-leveled, with 238MB of SRAM dedicated to the processing elements offering 1 petabyte/s of memory bandwidth, four 1MB scratchpads, and two 64-bit wide ports of LPDDR5, providing up to 32GB of external DRAM. Host and chip-to-chip connectivity is provided by high-speed PCI-Express Gen5 interfaces.

The imAIgine Software Development Kit Supports speedAI Family

The Untether AI imAIgine^TM Software Development Kit (SDK) provides a path to running networks at high performance, with push-button quantization, optimization, physical allocation, and multi-chip partitioning. The imAIgine SDK also provides an extensive visualization toolkit, cycle-accurate simulator, and an easily integrated runtime API and is available now.

Availability

speedAI devices will be offered as standalone chips as well as a variety of m.2 and PCI-Express form factor cards. Sampling of speedAI240 devices and cards to early access customers is expected to begin in the first half of 2023.

About Untether AI

Untether AI provides ultra-efficient, high-performance AI chips to enable new frontiers in AI applications. By combining the power efficiency of at-memory computation with the robustness of digital processing, Untether AI has developed a groundbreaking new chip architecture for neural net inference that eliminates the data movement bottleneck that costs energy and performance in traditional architectures. Founded in Toronto in 2018, Untether AI is funded by CPPIB, General Motors, Intel Capital, Radical Ventures, and Tracker Capital. www.untether.ai.

All references to Untether AI trademarks are the property of Untether.AI. All other trademarks mentioned herein are the property of their respective owners.

Contacts

Media Contact for Untether AI:
Michelle Clancy, Cayenne Global, +1.503.702.4732

[email protected]

Company Contact:
Robert Beachler, Untether AI, +1.650.793.8219

[email protected]

Connect with Untether AI:
Twitter: @UntetherAI
LinkedIn: https://www.linkedin.com/company/untether-ai/

Fintech Jobs

Alecta Optimizes Cross-Asset Trading and Execution with FlexTRADER EMS

27th November 2024

Business Wire

Untether AI Unveils Its Second-Generation At-Memory Compute Architecture at HOT CHIPS 2022

READ NEXT

Leave a comment Cancel reply

Fintech Jobs

Related Content

Top stories

The hottest news this week

Media Packs

FinTech Futures Media Pack

FinTech Futures Sibos Media Pack

Webinar | 27 November 2024 | EMEA fintechs: unlock innovation with generative AI with AWS and NVIDIA

Webinar | 28 November 2024 | AI in financial services: Navigating the evolving regulatory landscape

Research report: Revenue enablement in financial services – 2024 global findings & insights

White paper: How AI is propelling innovation in financial services

Global survey report: Privacy in practice 2024

White paper: Cyberattacks in the financial services industry

E-book: The promise and peril of the AI revolution – managing risk

Report: It’s prime time for real-time 2024 – real-time payments adoption and growth around the globe

Banking Technology Magazine November 2024 issue out now

Upcoming events

Banking Tech Awards 2024

PayTech Awards USA 2024

Banking Tech Insights

Demystify Podcast: Demystifying agentification and personalisation in banking with Bud CEO Ed Maslaveckas

What the FinTech? | S.5 Episode 21 | Taking open banking payments mainstream

Demystify Podcast: Demystifying legacy modernisation with Irene Sandler, CMO of Mechanical Orchard

What the FinTech? | S.5 Episode 20 | The future of fraud prevention – live at Money20/20 USA

Video: Socure at Money20/20 USA 2024 – Powering ID verification and risk decisioning in financial services

Video: Experian at Money20/20 USA 2024 – Combining GenAI and rich data to drive innovation

Video: WorkWave at Money20/20 USA 2024 – Driving growth for field services companies

Video: Codat at Money20/20 USA 2024 – Innovation in B2B payments

Video: Form3 at Money20/20 USA 2024 – The evolution of instant payments

Sibos 2024 Content Hub – news and coverage from Beijing

Content Hub: Banking Tech Awards 2023 winners

FinTech Founders Video Series: how to build and run a start-up

First Federal Savings Bank Announces New Leadership Appointments

Boston Pickle Club Opens Second Facility in Norwell

SMArtX Advisory Solutions Releases 2025 Annual Outlook

Options Celebrates Successful Go-Live of Tokyo Stock Exchange’s arrowhead Upgrade

Alecta Optimizes Cross-Asset Trading and Execution with FlexTRADER EMS

OKX DEX API Powers Phantom Wallet’s Solana Swaps

OKX and Forteus Enter Tripartite Agreement with Komainu for Enhanced Institutional Crypto Trading and Custody

Legion Ranked Number 311 Fastest-Growing Company in North America on the 2024 Deloitte Technology Fast 500