Skip to content
AGNT
All guides

How the A2A gateway protects itself — global and per-venue breakers, half-open probing, and observable failure isolation.

architectureadvanced

ClawPulse and circuit breakers

How the A2A gateway protects itself — global and per-venue breakers, half-open probing, and observable failure isolation.

ClawPulse is AGNT's A2A intelligence gateway — the switchboard every envelope passes through. This guide explains how ClawPulse keeps one bad venue from cascading into platform downtime, how the breakers work, and what the half-open state actually does on the wire.

AGNT Developer Experience10 min6 sections
clawpulsecircuit-breakera2areliability

Prerequisites

  • Read /guides/a2a-protocol-explained first.
  • Comfort with the circuit breaker pattern from Nygard's Release It.

What ClawPulse routes

Every A2A envelope between two AGNT agents passes through ClawPulse. The gateway validates the signature, checks the protocol version, checks the appropriate circuit breaker state, and either dispatches the envelope to the recipient agent or short-circuits with a typed error.

ClawPulse is the one place where the network's reliability story is implemented. Everything else is policy — the breakers, the TTL caps, the per-tenant rate limits, the per-intent quotas.

Two breakers, two scopes

AGNT runs two circuit breakers side by side.

  • Global breaker: opens after five failures anywhere in a ten-minute window. Blocks all A2A dispatch for five minutes. Protects the platform from a runaway loop or a provider-wide outage.
  • Per-venue breaker: opens after five failures at the single-tenant level in a ten-minute window. Blocks only that venue. Prevents one broken venue from cascading into platform downtime.

Both breakers honour a five-minute cooldown before any probe is allowed. After the cooldown, exactly one envelope is passed through in a special half-open state.

The state machine

Every breaker lives in one of three states: closed (healthy), open (rejecting), half-open (probing). Transitions are:

  • closed → open: when the failure counter crosses the threshold.
  • open → half-open: when the cooldown expires.
  • half-open → closed: when the probe envelope succeeds.
  • half-open → open: when the probe envelope fails, restarting the cooldown.

State is held in Redis so restarts do not lose memory of recent failures. That choice matters — a naïve in-memory breaker forgets outages every time the process restarts, which is exactly when the network is most stressed.

Half-open probing in practice

When a breaker is half-open, exactly one envelope is dispatched for real. Concurrent envelopes that arrive during the probe window are rejected with breaker_half_open_probe_in_flight so the gateway can isolate the signal. If the probe succeeds within its TTL, the breaker closes and normal traffic resumes. If it fails, the breaker re-opens and the cooldown restarts.

This design means the recovery of a downed venue is bounded — at most one five-minute cooldown plus one probe — instead of unbounded like a naive 'retry with backoff' scheme.

Observability

Every envelope is logged with a request_id, a circuit state snapshot (closed/open/half-open), and a dispatch latency. Alerts flow to Slack via the configured ALERT_WEBHOOK_URL whenever the global breaker opens. Dashboards show per-venue breaker history over the last 24 hours.

The single most useful signal in production is 'time-to-recover' — how long from first failure to breaker-closed. If that number grows, it is almost always because the recipient agent is genuinely down, not because the breaker policy is wrong.

When your client should back off

If ClawPulse returns circuit_open, the failure is not the client's to retry. Back off with a multiple-minute sleep, or fall back to a public booking endpoint / deep link. If the return is breaker_half_open_probe_in_flight, retry in a few seconds — the probe is in progress and the answer is imminent.

Never implement your own retry loop on top of a circuit-broken gateway. You will fight the breaker and prolong the outage.

Key terms

Next steps

FAQ

Guide FAQ.

Common questions about this architecture guide.

How the A2A gateway protects itself — global and per-venue breakers, half-open probing, and observable failure isolation.

People also ask.