Performance

Benchmarks

20 live metrics across 5 categories. All numbers from production telemetry and source code — not synthetic tests.

Latency

How fast AGNT responds — from message receipt to delivered reply.

First Reply Latency

Live

<0s

median first reply

Median time from guest message to agent-generated reply across all active venue agents. Includes LLM inference, tool execution, and message delivery.

Production telemetry, all platforms

Tool Execution Timeout

Live

max per tool call

Hard timeout for every tool invocation (venue search, booking, calorie scan, etc.). If a tool exceeds 5 seconds, it fails gracefully and the agent responds without it.

tool_executor.py:74Glossary

A2A Gateway Timeout

Live

max round-trip

Maximum wait time for a ClawPulse A2A message round-trip. Includes envelope signing, network transit, venue agent processing, and response delivery.

clawpulse.py:50Glossary

LLM Inference Timeout

Live

max Claude API call

Maximum allowed time for a single Claude API call. In practice, Haiku completes in 1-3s and Sonnet in 3-8s. The 45s cap handles edge cases with complex multi-tool conversations.

llm_gateway.py:277

Reliability

How AGNT stays up — circuit breakers, retry logic, and failure isolation.

Circuit Breaker Threshold

Live

failures to trip

A2A gateway circuit breaker opens after 5 failures within a 10-minute window. Applied both globally and per-venue. Half-open probe allows one request to test recovery after 5-minute cooldown.

clawpulse.py:40-42Glossary

Message Delivery Retries

Live

attempts with backoff

Failed message sends retry with exponential backoff: 1 minute, 5 minutes, 25 minutes. Permanent failures (invalid recipient, 404, 401) go directly to DLQ. Transient failures (timeout, 503) get all 3 attempts.

send_queue.py:30,34-38

Channel Circuit Breaker

Live

consecutive failures

Per-platform circuit breaker for WhatsApp/Telegram/Instagram message delivery. Opens after 10 consecutive failures, resets after 2 minutes. Prevents cascading failure when a platform is down.

channel_sender.py:38-39

Dead Letter Queue

Live

message capacity

Permanently failed messages are stored in a dead letter queue capped at 5,000 entries with 7-day retention. Alerts fire when the queue exceeds 4,000 entries.

send_queue.py:31,39

Token Cost

What AGNT spends on LLM inference — per model, per tier.

Claude Haiku Cost

Live

$0.80

per 1M input tokens

Claude Haiku 4.5 handles free and starter tier conversations. Output cost: $4.00/1M tokens. Max output: 1,024 tokens per response. Used for simple queries, FAQ responses, and venue search.

llm_gateway.py:152-153Pricing

Claude Sonnet Cost

Live

$3.00

per 1M input tokens

Claude Sonnet 4.6 handles pro tier conversations. Output cost: $15.00/1M tokens. Max output: 2,048 tokens per response. Used for complex multi-tool queries, booking coordination, and personalized recommendations.

llm_gateway.py:154-155

A2A Booking Fee

Live

$0.25

per confirmed booking

Metered billing for external agents using the AGNT Open Network. Venue searches are free. Each booking.confirm intent costs $0.25. Batched nightly to Stripe usage records.

a2a_public.py, metered billingDevelopers

Global Daily Budget

Live

$500

platform LLM spend cap

Hard daily cap on total LLM inference spending across all users. Prevents runaway costs. Tracked in Redis with 24-hour expiry. Fails open on Redis outage to avoid blocking users.

config.py:47

Memory and Context

How AGNT remembers — conversation history, semantic recall, and soul construction.

Semantic Recall Depth

Live

facts retrieved per turn

Each non-trivial user message is embedded and searched against stored memory via pgvector cosine distance. Top 10 relevant facts are injected into the system prompt alongside structural keys.

soul_loader.py:159Glossary

Conversation Window

Live

messages per LLM turn

Up to 20 messages stored per conversation, 12 sent to the LLM per turn. Conversation TTL is 24 hours. History is encrypted at rest with Fernet and synced to PostgreSQL as backup.

session_store.py:11-13

Soul Prompt Cache TTL

Live

cache lifetime

Constructed soul prompts (structural memory + context) are cached in Redis for 1 hour. Cache is invalidated immediately on memory writes so the agent always has fresh context.

soul_loader.py:14Glossary

Embedding Concurrency

Live

concurrent embed calls

Maximum 20 concurrent embedding API calls with 30-second timeout per call. Prevents overwhelming the embedding service during high-traffic memory write bursts.

memory_writer.py:19,260

Throughput and Capacity

How much AGNT handles — concurrent calls, token budgets, and rate limits.

LLM Concurrency

Live

simultaneous LLM calls

Global semaphore limiting concurrent Claude API calls to 30. Additional requests queue with a 15-second acquisition timeout. Prevents API rate limit exhaustion under load.

llm_gateway.py:15,203

Free Tier Token Budget

Live

tokens per day

Daily token limit for free-tier users. Tracked in Redis with 24-hour expiry. At approximately 500 tokens per message, this supports about 100 conversations per day.

llm_gateway.py:26

Pro Tier Token Budget

Live

tokens per day

Daily token limit for pro-tier users and venue pro subscriptions. 20x the free tier. Supports sustained high-volume usage including multi-tool conversations and complex queries.

llm_gateway.py:28Pricing

API Rate Limit

Live

0/min

requests per minute

Default rate limit for general API endpoints. LLM-specific endpoints: 10/min. Booking endpoints: 5/min. Redis-backed with fail-open on Redis outage.

limiter.py:10-15,77

Methodology

All benchmarks are sourced from AGNT production telemetry and verified against source code. Configuration values link to their exact file and line number in the codebase. Runtime metrics are collected from Prometheus-style counters exposed at /metrics.

We do not publish synthetic benchmarks, fabricate numbers, or cherry-pick favorable conditions. Every metric on this page is either a hardcoded configuration value (verifiable in source) or a production measurement (updated as we ship).

Metrics marked Live are actively measured. Metrics marked Measuring will be published once we have statistically significant sample sizes.

Proof ships with the product.

These numbers come from the same system you build on. Start building and measure your own.

Developer platform Comparisons