← Back to Projects

AI API Gateway: Unified Provider Access Infrastructure

> Private repository. Available for code review on request.

▍ Problem Space

Organizations using multiple Large Language Model providers (OpenAI, Anthropic, Google, etc.) face systemic infrastructure challenges:

  • Protocol Fragmentation: Each provider has a proprietary request/response format, streaming semantics (SSE), error handling, and authentication model.
  • Provider Capacity Planning: Provider limits and service-level objectives require compliant routing, capacity monitoring, and graceful degradation before saturation.
  • Unpredictable Latency: "Thinking" phases for frontier models can last up to 2-3 minutes, causing idle timeout disconnects at the load balancer level.
  • Lack of Unified Observability: Usage volume, latency distribution, and error rates are fragmented without centralized control.

Businesses need a single Gateway that provides a unified OpenAI-compatible API, transparent routing between providers, resilience to network anomalies, and strict consistency of distributed state across nodes.

▍ Architecture

The system is a high-load reverse proxy and API gateway written entirely in Rust. It's structured as a Cargo workspace with 15+ crates, enforcing a strict separation between domain, infrastructure, and API layers.

┌─────────────────────────────────────────────────────────┐
│                     CLIENTS                             │
│         (OpenAI SDK, curl, any HTTP client)             │
└───────────────────────┬─────────────────────────────────┘
                        │ OpenAI-compatible API
                        ▼
┌─────────────────────────────────────────────────────────┐
│                   GATEWAY LAYER                         │
│  ┌──────────┐  ┌──────────────┐  ┌───────────────────┐  │
│  │ Protocol │  │   Session    │  │  Load Balancer    │  │
│  │ Adapter  │  │   Affinity   │  │  (least-loaded)   │  │
│  │ (transl.)│  │   Manager    │  │                   │  │
│  └────┬─────┘  └──────┬───────┘  └────────┬──────────┘  │
│       │               │                   │             │
│  ┌────▼───────────────▼───────────────────▼──────────┐  │
│  │              STATE LAYER                          │  │
│  │  ArcSwap (lock-free config)  +  CRDT/LWW sync     │  │
│  │  PostgreSQL (Event Sourcing + streaming replica)  │  │
│  └───────────────────────────────────────────────────┘  │
└───────────────────────┬─────────────────────────────────┘
                        │ Managed connection pool
                        ▼
┌─────────────────────────────────────────────────────────┐
│               UPSTREAM PROVIDERS                        │
│     OpenAI    │    Anthropic    │    Google    │  ...   │
└─────────────────────────────────────────────────────────┘

Key Components:

  • Protocol Adapter: Bidirectional format translation (OpenAI ↔ proprietary provider APIs). Clients interact through a single OpenAI-compatible interface regardless of which provider handles the request.
  • Session Affinity Manager: Persistent binding of "client session → upstream provider", surviving service restarts. Improves cache locality and ensures predictable behavior for stateful dialogs.
  • Load Balancer: Least-loaded routing with anti-thundering-herd protection during initial session assignment. Balances traffic across approved provider integrations and preserves service-level objectives.
  • State Layer: `ArcSwap` for lock-free hot reloading of configuration (zero contention on the hot path). CRDT/LWW with tombstone records for state synchronization across nodes. PostgreSQL with Event Sourcing and streaming replication acts as the single source of truth.
  • Managed Connection Pool: RAII-controlled connection pool with aggressive HTTP/2 keepalive to prevent idle timeout disconnects during extended generation phases.

Infrastructure:

  • Frontend: Administrative dashboard in Rust (Leptos + WASM) — real-time monitoring, account management, cost analytics.
  • DevOps: Nix Flakes (reproducible builds) + systemd socket activation (zero-downtime deploy).

▍ Metrics (Production Data)

The system operates under real production load:

OpenAI-compatible
Client Interface
3+ approved
Provider Integrations
zero downtime
Deployment Continuity
multi-node CRDT
State Consistency
~94%
Availability Target
graceful SSE close
Failure Handling
centralized analytics
Cost Visibility
real-time dashboard
Admin Surface
observability + controls
Production Ownership

▍ Key Engineering Decisions

Problem
Provider capacity, routing, and session states must be consistent across nodes without a central coordinator.
Solution
LWW (Last-Write-Wins) CRDT with tombstone records. Nodes replicate state independently; conflicts are resolved by timestamp. Tombstones prevent the "resurrection" of deleted records during merges.
Alternative Rejected
Raft/Paxos — Excessive complexity for an eventually-consistent workload; CRDT doesn't require leader election.
Problem
Configuration (provider list, quotas, routing rules) changes at runtime. A classic RwLock creates contention with thousands of concurrent requests.
Solution
ArcSwap — atomic replacement of Arc<Config> without locks. Readers get a snapshot in O(1), the writer publishes a new version atomically. Zero contention on the hot path.
Problem
When an upstream connection drops during SSE streaming, the client receives an incomplete stream, breaking SDK parsing.
Solution
The Gateway intercepts network errors and generates a synthetic `[DONE]` chunk with `finish_reason: "error"`, converting a transport failure into a graceful stream termination. Client code handles this as a normal completion, not an exception.
Problem
Frontier models "think" for 60-180 seconds. Upstream load balancers drop idle connections due to timeouts (30-60s) even though the request is still processing.
Solution
Aggressive HTTP/2 PING keepalive at the multiplexer level. Keeps the connection active for intermediate load balancers without disrupting model execution.
Problem
Stateful provider dialogs benefit from cache locality. Random upstream switching increases cost variance and can make long-running conversations less predictable.
Solution
Persistent "client session → upstream provider" binding stored in PostgreSQL. Survives restarts. Upon assigning a new session, a least-loaded algorithm with anti-thundering-herd protection is used.

▍ Tech Stack

Backend
Rust, Axum, Tokio, SQLx, PostgreSQL, ArcSwap, DashMap
Frontend
Rust, Leptos, WebAssembly (WASM)
DevOps
Nix (Flakes), Systemd (socket activation), Podman

▍ Demonstrated Competencies

Systems Architecture
Designing a distributed stateful service resilient to network anomalies, partial failures, and prolonged upstream latency.
Distributed Systems
Practical application of CRDT, Event Sourcing, and PostgreSQL streaming replication in production.
Performance Engineering
Lock-free hot paths, zero-copy streaming, connection pool management without leaks under constant load.
Production Operations
Zero-downtime deployment, deep instrumentation (metrics, tracing), graceful degradation during upstream provider outages.
Rust Ecosystem Mastery
Workspace with 15+ crates, type-safe domain models, and exhaustive pattern matching across failure boundaries.

Ready to build something like this?

Start a Project