Doom NPCs with Zero-Knowledge Proofs

Jan 6, 2025

banner

This is a guest post written by Inference Labs. You can see their version of the post here.

From Web3 and Web2 platforms to traditional brick-and-mortar businesses, every domain we navigate is shaped by rigorously engineered incentive systems that structure trust, value, and participation. Now player 2 has entered the chat — AI Agents. As they join, how do we ensure open and fair participation for all? From “Truth Terminal” to emerging AI Finance (AiFi) systems, the core solution lies in implementing robust verification primitives. Just as SaaS could not thrive without security and trust of HTTPS, AI Agents depend on foundational proofs to foster fairness, credibility, and balanced engagement.

By verifying an AI Agent’s inference process using zero-knowledge proofs (zk-proofs), we can guarantee deterministic, trustless, and privacy-preserving execution of them in real-time. This approach is not only confined to gaming but also extends to AI Agent Memecoins, AiFi, DeFi, secure user authentication, governance and beyond.

What Does Verifying an AI Mean?

Many AI Agents operate as black boxes where no one but the developer knows exactly what’s behind the curtain. What’s the alternative? If the agent is fully open source, it is easily reproducible and open to attacks. If your opponent knows your next move every time—they are your dominatrix… not adversary. Verification of an AI Agent’s actions involves representing its decision-making steps — such as the forward pass of a neural network — as a structured mathematical circuit subject to cryptographic verification. Tools like EZKL let us generate zk-proofs certifying that the AI’s outputs are correct without revealing the agent’s internal workings or intellectual property. This ensures the results are provably correct, tamper-proof, and trustless.

In our focus scenario, we target Doom RL, a reinforcement learning environment derived from the classic shooter. Doom’s discrete state and action spaces and deterministic policies align well with circuit constraints, making it a model proving ground for verifiable AI inference.

The Experiment: Circuitizing a Doom RL Agent in Real-Time

Using a popular Reinforcement Learning (RL) method called Proximal Policy Optimization (PPO) on a Doom environment, we transformed the PPO agent’s decision process into a verifiable function.

Action Space Encoding

Instead of approximating continuous controls or dealing with raw image pixels, we used a discrete action space (e.g., move forward, turn left, fire weapon) represented as numerical variables. These were integrated directly into the arithmetic circuits, ensuring that every action chosen by the AI is independently verifiable.

Reward Function Representation

Doom’s reward structure — based on factors like enemy elimination, health pickups, or time survived — was encoded into the circuit. As the agent steps through each environment state, the circuit calculates expected returns. This makes it possible to verify not only decisions but also the correctness and fairness of the score or reward itself.

Real-Time Proof Generation

By employing zkSNARKs and optimizing circuit design, we approached near real-time verification of the AI’s decisions. Although still resource-intensive, this breakthrough suggests a future where live-streamed AI matches are accompanied by a parallel stream of proofs confirming that each move is honest, correct, and free of tampering.

proof

The Process of Circuitizing a Doom RL Agent

circuit

1. Selection of Reinforcement Learning Algorithm

We chose PPO due to its widespread use, stability, and deterministic inference mode. PPO’s stable policy gradient updates produce models with predictable forward passes, which is crucial for encoding into circuits.

2. Preprocessing the Game Environment

To reduce circuit complexity, we reshape raw pixel inputs with a low-dimensional state. This simplification allowed us to construct circuits that remain computationally manageable without losing the strategic depth of the environment.

3. Quantization for Circuit Constraints

Neural network parameters — weights and activations — were quantized into fixed-point numbers. Fixed-point arithmetic is more easily represented in circuits than floating-point, allowing for efficient, deterministic computations.

4. Constructing the Circuit

We mapped the PPO policy network’s forward pass to a set of arithmetic constraints using EZKL.

5. Action Selection Logic

The model’s discrete action selection (argmax over action probabilities) was integrated directly into the circuit. Each step’s chosen action can be cryptographically verified without revealing the underlying network weights.

6. Reward and Score Verification

By embedding reward calculations into the circuit, external observers can confirm that the agent is rightfully earning its score — no cheating, tampering, or unaccounted bonuses.

7. Verification and Testing

Using EZKL, we generated proofs the agent’s moves and associated outcomes are correct. Internal testing confirmed that these proofs can be produced quickly enough to not disrupt the flow of real-time gameplay.

Challenges in Circuitizing AI

1. Complex and Large Inputs

Pixel-based inputs are a well-known bottleneck. Converting raw game frames into simplified variable states dramatically reduces circuit size. Future work may explore more advanced compression or feature extraction to preserve richer information without ballooning circuit complexity.

2. Delayed Rewards

Doom and many RL environments feature delayed or sparse rewards. Encoding these into circuits requires careful accumulation logic, increasing complexity. Further research into rolling score verification or cumulative state tracking is ongoing.

3. Comparisons with Alternatives

Other RL methods — like PEARL or continuous-action policies — introduce probabilistic components or complex continuous spaces. While theoretically circuitizable, they demand more intricate arithmetic logic and may not yet be practical.

Why this matters beyond Gaming?

1. Decentralized Finance (DeFi) and Trading Bots

Circuitized AI can serve as the backbone of verifiable trading strategies. Traders and stakeholders can trust that bot decisions follow predefined strategies without insider manipulation or fraud — a game changer for privacy preserving finance and anti fraud measures.

2. Decentralized Autonomous Organizations (DAOs) Governance

AI-driven proposals and votes can be verified cryptographically. Members know that decisions were derived fairly, without undisclosed biases or tampering, bolstering Trustworthy AI and Accountable digital governance.

3. Explainable and Interpretable AI

Circuit representations encourage explainable AI and Interpretable AI by forcing developers to break complex models into understandable building blocks. The circuit acts as a transparent blueprint for the AI’s logic.

4. Secure User Authentication & Data Privacy

Circuitized inference can verify a user’s identity or a model’s authenticity without revealing underlying private data. This ensures data privacy, model privacy, and intellectual property protection.

5. Robustness Against Adversarial Attacks

By verifying that actions align with known, correct computations, we reduce vulnerability to adversarial attack and tamper proof compromise. This ensures trust and safety in digital interactions.

6. Quantum-Computing-Ready Security

As quantum computing advances, circuit-based cryptographic proofs remain a promising solution to maintaining future-proof, tamper proof verifiable inference pipelines.

7. Healthcare & Diagnostics

In sensitive domains like medical diagnosis (e.g., monitoring blood pressure or critical vitals), circuitized inference ensures bias detection, fair evaluation, and traceability. Treatment decisions verified cryptographically could ensure that liability and accountability are built into AI-driven healthcare tools.

8. Real-Time Streaming and Entertainment

Livestreaming AI competitions (e.g., Doom tournaments or other classic games like Quake, Frogger, Tetris) with integrated zk-proofs fosters realtime proof_systems. Viewers can trust that the displayed gameplay is genuine, cheat-free, and tamper proof. Audiences can wager on matches confidently, knowing results are verifiable inference based.

Why verifying an AI Agent is Pivotal

Doom RL’s high intensity, split-second decision-making environment provides an ideal testbed for real-time verification methods applied to AI systems. Demonstrating success in this setting creates a foundation for broader, more impactful deployment scenarios — whether it’s verifying the trustworthiness of an AI-driven “Memecoin” campaign, ensuring decentralized finance (DeFi) trading bots follow verifiable, pre-agreed strategies without fraudulent deviations, or reinforcing systems against adversarial attacks in cybersecurity. Furthermore, interoperability between systems demands a streamlined, verification process with structured output that can be easily integrated and understood by APIs, other AI agents, and human stakeholders.

The Road Ahead

1. Tech Demo Release

Deploy a verified Doom RL agent in a public demo.

2. Optimization Challenges

Create challenges around A* pathfinding algorithms to be solved and optimized competitively by SN2 miners.

3. Agent vs Agent Challenges

Launch community-driven tournaments where multiple circuitized agents compete, their every action verifiably correct and accountable.

4. Integration with DeFi & Enterprises

Adapt circuit-based verification into trading, operations, and enterprise AI deployments.

5. Ecosystem Infrastructure

Expand GPU acceleration for zk-circuits, integrate inference engineering best practices, and streamline Sequential Decision Process representations.

Conclusion

By verifying a Doom RL agent’s decision-making using zero-knowledge proofs, we create a system where trust does not rely on blind faith. Instead, it comes from mathematical certainty. It marks a step toward AI that we can truly rely on, no matter where it’s applied.

Follow Inference Labs on Telegram for the latest: t.me/inference_labs