Accelerating Proofs on Apple Devices

Title Image: Metal Bindings

Introduction

Mobile devices are our go-to platform for privacy-sensitive tasks. They hold all our photos, messages, emails, and even our location. With an array of sensors, they know more about us than we might like to admit. It’s no wonder they’re a match made in heaven with Zero-Knowledge Proofs (ZKPs), which allow us to use this data securely without exposing it.

Yet, ZKPs have never been friendly to mobile devices, or even laptops. Modern proving pipelines demand massive amounts of memory and compute power, typically requiring servers to generate succinct proofs. But our goal is different: everything on-device. After successfully deploying EZKL on mobile devices, we wanted to improve the experience and make it more efficient.

The obvious solution? Use the GPU. Modern devices have GPUs that are increasingly treated as dedicated compute engines rather than just graphics tools. Our focus was on accelerating Multi-Scalar Multiplication (MSM), a key operation in the KZG polynomial commitment scheme and the single biggest bottleneck in ZKP proving. MSM alone accounts for up to 70% of compute time, so every bit of improvement here matters.

To use the new Metal bindings you can install ezkl on any M-chip apple silicon device with the following command:

curl https://raw.githubusercontent.com/zkonduit/ezkl/main/install_ezkl_cli.sh | bash

Or you can build it from source with the following command:

cargo build --release --features macos-metal

or if building for iOS:

cargo run --bin ios_gen_bindings --features "ios-bindings uuid camino uniffi_bindgen ios-metal" --no-default-features

Impact:

Our Metal MSM optimizations unlock some interesting use cases:

  • On-Device KYC and Fraud Detection: Sundial’s ZK-KYC Onflow, launching in Q1, can run private checks and image processing directly on the device, ensuring user data never leaves the phone.
  • Credit Scoring & Risk Assessment: Trustless scoring models can run locally and update platforms like Sentiment without revealing sensitive financial information.
  • Future Potential: With GPU acceleration in place, we can imagine a host of privacy-preserving applications leveraging this capability.

Why Apple Devices?

We specifically targeted Apple devices for several reasons:

  1. Unified Memory: Apple’s architecture allows the CPU and GPU to share memory, avoiding the heavy data transfer overhead seen on other platforms.
  2. Hardware Uniformity: Unlike other ecosystems, Apple’s devices are relatively consistent in their CPU-GPU design, making them easier to optimize.
  3. Previous Work: We had already focused on iOS bindings for EZKL, making Apple the logical next step.
  4. Shared Architecture: Both A-series (iPhones) and M-series (MacBooks) chips share the same underlying architecture. This means optimizing for iPhones also optimizes for Macs. And since much of EZKL’s development happens on M-series Macs, making them faster directly helps us ship more improvements faster!

Implementation & Results

We built on the foundational work by Jeff (tg @foodchain1028) and Moven (tg @moven0831), who focused on MSM optimization using Metal for Arkworks. Our goal was to bring similar improvements to Halo2, the framework required by EZKL.

Key Steps:

  1. Splitting MSM: We divided the workload between the CPU and GPU, leveraging the strengths of each.
  2. Unified Memory: We optimized for Apple’s unified memory architecture to minimize data transfer costs.
  3. Parallelization: Metal shaders were used to accelerate the largest compute chunks.

Performance Gains:

  • 2× faster than the CPU-based implementation on both M-series Macs and iPhones for log(2^20) (~1 million points).
  • 15× faster compared to the previous GPU baseline.
  • 9% improvement in overall proving time after integration into EZKL.

While the 9% improvement in proving time is less than we initially hoped, we suspect this is due to the specifics of how MSM is used in real proofs - especially with cases involving many zeros or smaller circuits. Further investigation will need to be done.

M1 Performance Graph iPhone Performance Graph
Figure 1: Performance comparison of CPU vs GPU (Metal-enabled) on an M1 Pro for MSM computation. Figure 2: CPU-only vs combined CPU+GPU performance on an iPhone 15 Pro for MSM computation.

Next Steps

There’s still plenty of room for improvement:

  • Refine GPU-CPU Splitting: Fine-tune heuristics for distributing workloads and GPU-specific parameters.
  • Incorporate New MSM Techniques: Explore recent research, such as ePrint 2022/1321.
  • Specialized Proving Systems: Investigate low-power provers optimized for mobile devices (ePrint 2024/1970).
  • Address Other Bottlenecks: Tackle FFT, another key operation in ZKP pipelines.

With these steps, we aim to push the boundaries of what’s possible on mobile devices, bringing ZKP computations closer to everyday applications.

We can’t wait to see what the community does with this!