AI Infrastructure (Platform)

GAPI — The AI Infrastructure We Ship Ourselves On

The backbone behind every AI product we ship. Multi-provider model routing, tenant-scoped RAG, 99.9% uptime across 5 production tenants. Every next client AI ships on rails we already own.

Client
GALOR
Sector
AI Infrastructure (Platform)
Engagement
Ongoing — production since 2024
Year
2024
  1. GAPI is the infrastructure layer under every AI product we ship. It's the reason client AI work moves from quarter-long commitments to week-long builds, and it's the reason TFL AI's 1.4M-document RAG, the OLAF Heartbeat agent, and three other production tenants all run on one spine.

  2. Why it exists

    The agency thesis was simple: if every AI client project starts with the same scaffolding — model routing, embeddings, vector store, auth, tenancy, observability — then shipping the scaffolding once is strictly better than shipping it ten times. Without a platform, AI agencies cap out. With one, every next engagement compounds.

  3. What GAPI is

    • Multi-provider model router. Claude Opus, Claude Sonnet, GPT-5, and self-hosted Ollama behind a unified API. Fallback chains per tenant, cost routing per query type, automatic retry on provider outages.
    • Managed retrieval. pgvector on Supabase with per-tenant namespace isolation. Chunking, embedding, and re-ranking are configuration, not code.
    • Orchestration layer. C#/.NET worker fleet handling long-running jobs, scheduled re-indexing, and multi-step agent workflows.
    • Python ML services for domain-specific models — classification, OCR, speech, custom embeddings.
    • Observability by default. Every query is traced, every token counted, every latency logged. Per-tenant dashboards are live on day one.
    • Security baseline. AES-256 at rest, EU-resident compute, SOC-2-aligned controls, zero cross-tenant data leakage enforced at the database layer.
  4. What it runs today

    Five production tenants. 1,800+ end users. 99.9% uptime across the last trailing quarter. 40-60 second end-to-end latency on RAG-backed queries. The OLAF Heartbeat agent reads our own OpenAPI spec and reasons about system state — GAPI eats its own dog food.

  5. The payoff

    The next client AI project no longer starts at zero. Model access, retrieval, auth, tenancy, and observability are already there. We ship the differentiated product surface — the thing the client actually pays for — in 2-4 weeks instead of 10-14. Gross margin on AI projects moved from roughly 35% to above 60% inside two quarters, because the infrastructure cost is amortized across the tenant base.

  6. Why this is a business case study, not a product pitch

    GAPI isn't sold. It's the reason our delivery timelines and margins look the way they do. Every client engagement that starts with "we need AI for X" ships on GAPI without the client ever needing to know it exists. That's the point — the platform is invisible; the outcome isn't.

  7. Where this replicates

    Any agency or internal platform team shipping more than three AI products is paying the same tax. The right answer is never "buy a platform" and never "hand-roll each one" — it's own the primitives your business depends on. If you're wiring the same three vendors into every new project, you're the customer for this pattern.

By the numbers

What shipped, in figures. 4 metrics.

Production tenants
5 (TFL AI, OLAF, 3 client tenants) From 0
Uptime across tenants
99.9% SLA From Per-project ad-hoc
Client AI time-to-ship
2-4 weeks From 10-14 weeks
Gross margin on AI builds
>60% From ~35%

Want outcomes like this?

Book AI Audit (€900)