TL;DR

An infrastructure team created a custom automated reinforcement-learning pipeline that relies on a Lean REPL service to mediate between models and interactive Lean proofs. Initial implementations — a local AsyncLeanREPLProcess library and a GKE + FastAPI + WebSocket service — met functional needs but revealed reliability and scaling limits, prompting a redesign (v2) mentioned but not detailed in the source.

What happened

The team described the engineering of a REPL service that brokers all interactions between automated models and Lean theorem proofs. Lean proofs progress by applying tactics that transform proof states; the REPL exposes four core operations to support tree exploration programmatically: run Lean code, run a tactic, export a state, and import a state. The project began with a local library (AsyncLeanREPLProcess) that wrapped a Lean REPL process and presented an async API; a pool abstraction let a single machine leverage multiple CPUs. To scale across machines, the team deployed a web service on GKE using preemptible instances, exposing the REPL over a single WebSocket endpoint via FastAPI. That v1 design supported fully asynchronous requests using request IDs, but suffered from connection pinning by the load balancer, unreliable WebSocket connections, and reconnection problems; these issues prevented reliable autoscaling and robust handling of preemptions. The post indicates a move toward a v2 design (gRPC) but details are not present in the source.

Why it matters

  • Decoupling REPL execution from GPU-bound model training lets the RL system scale using many cheap CPUs instead of expensive GPU machines.
  • Running on preemptible instances reduces cost but requires the service to tolerate instance preemption and transient network issues.
  • A fully asynchronous, out-of-order protocol is necessary to avoid head-of-line blocking when queries vary widely in execution time.
  • Reliable reconnection and non-pinned routing are essential for autoscaling and maintaining high utilization in a distributed environment.

Key facts

  • The REPL service mediates all interactions between models and Lean proofs and is described as semantically stateless.
  • Core REPL operations: run Lean code, run a tactic, export state, and import state.
  • A local implementation (AsyncLeanREPLProcess) provided an async API and managed exported state files on disk; AsyncLeanREPLPool allowed multiprocessing on a single host.
  • v1 deployed on Google Kubernetes Engine used preemptible instances, a FastAPI wrapper, and a single WebSocket endpoint to accept many concurrent queries over a connection.
  • To support out-of-order responses, the client used a request_id pattern mapping requests to futures.
  • WebSocket-based v1 suffered from connection pinning by the load balancer, which hurt load distribution and reconnection behavior.
  • Some WebSocket disconnections were correlated with node preemptions; others occurred without clear root cause.
  • State references had to include hostnames in v1 so clients could avoid referencing wrong state files when requests hit different backends.
  • The service must autoscale from near zero to clear queues within ten minutes and sustain at least 90% utilization.
  • The source references a planned v2 transition to gRPC but provides no implementation details in the excerpt.

What to watch next

  • How the v2 gRPC redesign addresses connection pinning, reconnection guarantees, and out-of-order delivery — not confirmed in the source.
  • Specific mechanisms the team will use to make the service robust to preemption and to meet the ten-minute scale-up and 90% utilization targets — not confirmed in the source.

Quick glossary

  • Lean: A proof assistant and programming language used to write and check formal mathematical proofs interactively.
  • REPL: Read–Eval–Print Loop: an interactive programming environment; here, a programmatic interface to execute Lean operations and inspect proof states.
  • Preemptible instance: A low-cost cloud VM that can be reclaimed by the provider with little notice; useful for cost-sensitive, fault-tolerant workloads.
  • WebSocket: A protocol providing full-duplex communication channels over a single TCP connection, commonly used for real-time client-server messaging.
  • gRPC: A high-performance RPC framework that uses HTTP/2 for multiplexed, bidirectional streaming and structured messaging.

Reader FAQ

What does the REPL service do?
It mediates programmatic interactions with Lean proofs, exposing operations to run code, execute tactics, and export/import proof states.

Why not run everything on GPU machines?
The project separates CPU-bound REPL work from GPU-bound model training because GPUs are expensive and machine CPU/GPU ratios limit available CPU capacity.

Did the WebSocket-based service meet expectations?
Partially: it supported asynchronous, out-of-order requests but suffered from connection pinning, reconnection problems, and intermittent unreliability.

Is the gRPC-based v2 service in production?
Not confirmed in the source.

Can the service tolerate instance preemption?
The design requires handling preemption gracefully, but specific operational outcomes and mechanisms are not confirmed in the source.

Running Lean at Scale 09.11.2025 Our infrastructure team has developed a custom automated reinforcement learning system to continuously improve our models for proving Lean theorems. A critical part of this…

Sources

Related posts

By

Leave a Reply

Your email address will not be published. Required fields are marked *