How the A2A gateway protects itself — global and per-venue breakers, half-open probing, and observable failure isolation.
ClawPulse and circuit breakers
How the A2A gateway protects itself — global and per-venue breakers, half-open probing, and observable failure isolation.
ClawPulse is AGNT's A2A intelligence gateway — the switchboard every envelope passes through. This guide explains how ClawPulse keeps one bad venue from cascading into platform downtime, how the breakers work, and what the half-open state actually does on the wire.
Prerequisites
- Read /guides/a2a-protocol-explained first.
- Comfort with the circuit breaker pattern from Nygard's Release It.
Every A2A envelope between two AGNT agents passes through ClawPulse. The gateway validates the signature, checks the protocol version, checks the appropriate circuit breaker state, and either dispatches the envelope to the recipient agent or short-circuits with a typed error.
ClawPulse is the one place where the network's reliability story is implemented. Everything else is policy — the breakers, the TTL caps, the per-tenant rate limits, the per-intent quotas.
AGNT runs two circuit breakers side by side.
- Global breaker: opens after five failures anywhere in a ten-minute window. Blocks all A2A dispatch for five minutes. Protects the platform from a runaway loop or a provider-wide outage.
- Per-venue breaker: opens after five failures at the single-tenant level in a ten-minute window. Blocks only that venue. Prevents one broken venue from cascading into platform downtime.
Both breakers honour a five-minute cooldown before any probe is allowed. After the cooldown, exactly one envelope is passed through in a special half-open state.
Every breaker lives in one of three states: closed (healthy), open (rejecting), half-open (probing). Transitions are:
- closed → open: when the failure counter crosses the threshold.
- open → half-open: when the cooldown expires.
- half-open → closed: when the probe envelope succeeds.
- half-open → open: when the probe envelope fails, restarting the cooldown.
State is held in Redis so restarts do not lose memory of recent failures. That choice matters — a naïve in-memory breaker forgets outages every time the process restarts, which is exactly when the network is most stressed.
When a breaker is half-open, exactly one envelope is dispatched for real. Concurrent envelopes that arrive during the probe window are rejected with breaker_half_open_probe_in_flight so the gateway can isolate the signal. If the probe succeeds within its TTL, the breaker closes and normal traffic resumes. If it fails, the breaker re-opens and the cooldown restarts.
This design means the recovery of a downed venue is bounded — at most one five-minute cooldown plus one probe — instead of unbounded like a naive 'retry with backoff' scheme.
Every envelope is logged with a request_id, a circuit state snapshot (closed/open/half-open), and a dispatch latency. Alerts flow to Slack via the configured ALERT_WEBHOOK_URL whenever the global breaker opens. Dashboards show per-venue breaker history over the last 24 hours.
The single most useful signal in production is 'time-to-recover' — how long from first failure to breaker-closed. If that number grows, it is almost always because the recipient agent is genuinely down, not because the breaker policy is wrong.
If ClawPulse returns circuit_open, the failure is not the client's to retry. Back off with a multiple-minute sleep, or fall back to a public booking endpoint / deep link. If the return is breaker_half_open_probe_in_flight, retry in a few seconds — the probe is in progress and the answer is imminent.
Never implement your own retry loop on top of a circuit-broken gateway. You will fight the breaker and prolong the outage.
Key terms
Next steps