EZKL: ZK for Data Science

datascience

art by the illustrious kitvolta

Though we started the project with the goal of explicitly targeting machine learning and AI algorithms, as the library has grown in usage, we’ve found that it has been used more and more for simple data science algorithms as well. While our transformer and random forest implementations get a lot of attention, lots of our users use EZKL for analytics and straightforward statistical and financial computations.

From ML to Analytics

These aren’t the complex neural nets we initially designed for. Instead, we’re seeing widespread adoption of EZKL for verifiable versions of common data science workloads:

  • Reward calculation and distribution for BitTensor’s network consensus (Inference Labs) and for public goods funding programs (Optimism)
  • Volatility forecasting for lending protocols (Sentiment)
  • Momentum and mean-reversion calculations for AMM pools (QuantAMM)
  • Dynamic fee computations for Uniswap V3 (OpenGradient)
  • Portfolio rebalancing algorithms (QuantAMM)
  • Risk metric calculations (Sentiment)
  • Universal outcome resolution for “perpetuals for everything” markets (Strobe)

Using EZKL is dead simple in these usecases. Quants write their strategies in Python, convert them to circuits using our compiler, and generate proofs through Lilith. Our custom orchestration system handles the rest, with job dispatch times under 1ms across hundreds of workers.

Why It Works

The advantages are practical. Here are some case studies:

  1. Take Sentiment’s implementation — rather than running complex volatility calculations on-chain (super expensive) or trusting off-chain oracles (risky), they process GARCH volatility estimation models through EZKL.
  • The result: weekly loan to value updates with cryptographic guarantees – instead of manual updates via governance that can take months.
  1. QuantAMM took a similar approach with their Balancer V3 pools. Their quant strategies run off-chain, but every rebalancing of AMM pools comes with a proof verifying the calculations followed their intended model. They can also maintain strategy privacy while proving computational integrity.
  • The result: dynamic weighted pools that execute stedfast quantitative strategies, all verifiable on-chain.

Production Numbers

Some concrete metrics from our production environment:

  • Average proof generation time: less than 1-2s for typical financial calculations
  • Peak throughput: 200k+ proofs per day
  • Supported Python libraries: numpy, pandas, sklearn, pytorch, tensorflow
  • Typical gas costs: 300-400k gas for verification

All the building blocks are here today:

<3

EZKL was built to scale to AI/ML workloads (and these applications are popular), but a ton of usage today is for verifiable data science. The infrastructure and tooling are running in production, serving up to 200k+ proofs daily across lending protocols, AMMs, and consensus protocols. We’re grateful to our DeFi partners (and others) who’ve focused on building real applications that serve real users. If you’re building something that needs verifiable computation, drop by our Discord - we’d love to see what you create.