Balaji Srinivasan

Reliable AI Starts with Idempotency, Not Bigger Models

4 minutes (992 words)

Most AI failures in production aren’t because the model was wrong. They happen because the system around the model made the same decision twice — and acted on it twice.

In a distributed AI stack, that’s not just embarrassing — it’s expensive, operationally messy, and can destroy user trust.

🔗Why This Matters in the AI Era

In traditional web systems, a duplicate request might be annoying. In AI-driven systems, a duplicate request can be catastrophic:

The reason? AI orchestration is inherently multi-step, stateful, and asynchronous:

Without an idempotent design, retries and concurrency don’t just cause inefficiency — they cause irreversible side effects.


🔗What Idempotency Really Means for Architects

The textbook definition is simple:

An operation is idempotent if performing it multiple times has the same effect as performing it once.

For AI systems, the architectural meaning is richer:

Think of it like a circuit breaker for duplication — the gate that prevents your system from stepping on its own toes.


🔗Where Idempotency Belongs in AI Architecture

In AI systems, idempotency shouldn’t live in just one layer. It should be multi-tiered for safety:


🔗Implementation Strategies for AI Workflows

import hashlib

def generate_idempotency_key(customer_id: str, message_content: str) -> str:
    """Deterministically derive an idempotency key for a logical action."""
    content_hash = hashlib.sha256(message_content.encode("utf-8")).hexdigest()
    return f"{customer_id}:{content_hash}"

🔗Common Pitfalls


🔗Real-World Example: Healthcare/Insurance Claim Processing (Deep Dive)

Let’s make this concrete with a claim-processing flow that many teams ship:

Actors

Happy-path flow

  1. Intake Agent
    • Input: Scanned claim PDF
    • Output: claim_intake_id, normalized member_id, provider_id, dos (date of service), line items.
  2. Eligibility Agent
    • Calls eligibility service (external or internal API).
    • Output: coverage snapshot + eligibility_check_id.
  3. Adjudication Agent
    • Applies plan rules (deductible, co-insurance, bundling edits).
    • Output: adjudication decision + adjudication_id.
  4. Payment Agent
    • Creates payment instruction and posts payment gateway, then updates GL/AR.
    • Output: payment_id, eob_id.

🔗Where Idempotency Saves You (and How to Do It)

You need stable, repeatable keys for each side-effecting action. A practical way: derive keys from business identifiers + content hash. Keys must be deterministic across retries.

Key strategy (examples)

Guiding principle: One idempotency key per business effect, not per API call. If the same business intent repeats, you must return the same result.


🔗Minimal Storage (Postgres)

CREATE TABLE idempotency_keys (
  key TEXT PRIMARY KEY,
  status TEXT NOT NULL CHECK (status IN ('PENDING','COMPLETED','FAILED')),
  response JSONB NOT NULL,
  created_at TIMESTAMPTZ DEFAULT now()
);

-- Optional: narrow tables for hot paths
CREATE UNIQUE INDEX uniq_payment_per_claim
  ON idempotency_keys (key) WHERE key LIKE 'idemp:pay:%';

🔗Orchestrator Pattern (Temporal example)

// Pseudocode for clarity
export async function processClaim(claimFileRef: string) {
  const intake = await act(intakeNormalize, {
    idempotencyKey: keyIntake(claimFileRef)
  });

  const elig = await act(checkEligibility, {
    idempotencyKey: keyElig(intake.memberId, intake.planId, intake.dos)
  });

  const adj = await act(adjudicateClaim, {
    idempotencyKey: keyAdj(intake.claimIntakeId, adjRulesetVersion)
  });

  const pay = await act(postPaymentAndEOB, {
    idempotencyKey: keyPay(intake.claimIntakeId, adj.totalAllowed, adj.memberResponsibility)
  });

  await act(writeAuditLog, { idempotencyKey: keyAudit(adj.adjudicationId) });

  return { claimIntakeId: intake.claimIntakeId, paymentId: pay.paymentId };
}

Note: Each act(...) wraps a service call that checks idempotency server-side before mutating state.


🔗HTTP/Headers Pattern

Why headers? They make keys cross-cutting and visible in traces. Your observability can correlate the same claim through retries.


This is not just “good engineering hygiene.” It’s a cost and trust multiplier:

The fastest way to lose trust in AI isn’t hallucination — it’s doing the right thing twice.


🔗More in this Series

Tags: #engineering #distributed-systems #ai #agentic #system-design #architecture