Benchmarks
20 live metrics across 5 categories. All numbers from production telemetry and source code — not synthetic tests.
Latency
How fast AGNT responds — from message receipt to delivered reply.
First Reply Latency
Live<0s
median first reply
Median time from guest message to agent-generated reply across all active venue agents. Includes LLM inference, tool execution, and message delivery.
Tool Execution Timeout
Live0s
max per tool call
Hard timeout for every tool invocation (venue search, booking, calorie scan, etc.). If a tool exceeds 5 seconds, it fails gracefully and the agent responds without it.
A2A Gateway Timeout
Live0s
max round-trip
Maximum wait time for a ClawPulse A2A message round-trip. Includes envelope signing, network transit, venue agent processing, and response delivery.
LLM Inference Timeout
Live0s
max Claude API call
Maximum allowed time for a single Claude API call. In practice, Haiku completes in 1-3s and Sonnet in 3-8s. The 45s cap handles edge cases with complex multi-tool conversations.
Reliability
How AGNT stays up — circuit breakers, retry logic, and failure isolation.
Circuit Breaker Threshold
Live0
failures to trip
A2A gateway circuit breaker opens after 5 failures within a 10-minute window. Applied both globally and per-venue. Half-open probe allows one request to test recovery after 5-minute cooldown.
Message Delivery Retries
Live0
attempts with backoff
Failed message sends retry with exponential backoff: 1 minute, 5 minutes, 25 minutes. Permanent failures (invalid recipient, 404, 401) go directly to DLQ. Transient failures (timeout, 503) get all 3 attempts.
Channel Circuit Breaker
Live0
consecutive failures
Per-platform circuit breaker for WhatsApp/Telegram/Instagram message delivery. Opens after 10 consecutive failures, resets after 2 minutes. Prevents cascading failure when a platform is down.
Dead Letter Queue
Live0K
message capacity
Permanently failed messages are stored in a dead letter queue capped at 5,000 entries with 7-day retention. Alerts fire when the queue exceeds 4,000 entries.
Token Cost
What AGNT spends on LLM inference — per model, per tier.
Claude Haiku Cost
Live$0.80
per 1M input tokens
Claude Haiku 4.5 handles free and starter tier conversations. Output cost: $4.00/1M tokens. Max output: 1,024 tokens per response. Used for simple queries, FAQ responses, and venue search.
Claude Sonnet Cost
Live$3.00
per 1M input tokens
Claude Sonnet 4.6 handles pro tier conversations. Output cost: $15.00/1M tokens. Max output: 2,048 tokens per response. Used for complex multi-tool queries, booking coordination, and personalized recommendations.
A2A Booking Fee
Live$0.25
per confirmed booking
Metered billing for external agents using the AGNT Open Network. Venue searches are free. Each booking.confirm intent costs $0.25. Batched nightly to Stripe usage records.
Global Daily Budget
Live$500
platform LLM spend cap
Hard daily cap on total LLM inference spending across all users. Prevents runaway costs. Tracked in Redis with 24-hour expiry. Fails open on Redis outage to avoid blocking users.
Memory and Context
How AGNT remembers — conversation history, semantic recall, and soul construction.
Semantic Recall Depth
Live0
facts retrieved per turn
Each non-trivial user message is embedded and searched against stored memory via pgvector cosine distance. Top 10 relevant facts are injected into the system prompt alongside structural keys.
Conversation Window
Live0
messages per LLM turn
Up to 20 messages stored per conversation, 12 sent to the LLM per turn. Conversation TTL is 24 hours. History is encrypted at rest with Fernet and synced to PostgreSQL as backup.
Soul Prompt Cache TTL
Live0h
cache lifetime
Constructed soul prompts (structural memory + context) are cached in Redis for 1 hour. Cache is invalidated immediately on memory writes so the agent always has fresh context.
Embedding Concurrency
Live0
concurrent embed calls
Maximum 20 concurrent embedding API calls with 30-second timeout per call. Prevents overwhelming the embedding service during high-traffic memory write bursts.
Throughput and Capacity
How much AGNT handles — concurrent calls, token budgets, and rate limits.
LLM Concurrency
Live0
simultaneous LLM calls
Global semaphore limiting concurrent Claude API calls to 30. Additional requests queue with a 15-second acquisition timeout. Prevents API rate limit exhaustion under load.
Free Tier Token Budget
Live0K
tokens per day
Daily token limit for free-tier users. Tracked in Redis with 24-hour expiry. At approximately 500 tokens per message, this supports about 100 conversations per day.
Pro Tier Token Budget
Live0M
tokens per day
Daily token limit for pro-tier users and venue pro subscriptions. 20x the free tier. Supports sustained high-volume usage including multi-tool conversations and complex queries.
API Rate Limit
Live0/min
requests per minute
Default rate limit for general API endpoints. LLM-specific endpoints: 10/min. Booking endpoints: 5/min. Redis-backed with fail-open on Redis outage.
Methodology
All benchmarks are sourced from AGNT production telemetry and verified against source code. Configuration values link to their exact file and line number in the codebase. Runtime metrics are collected from Prometheus-style counters exposed at /metrics.
We do not publish synthetic benchmarks, fabricate numbers, or cherry-pick favorable conditions. Every metric on this page is either a hardcoded configuration value (verifiable in source) or a production measurement (updated as we ship).
Metrics marked Live are actively measured. Metrics marked Measuring will be published once we have statistically significant sample sizes.
Proof ships with the product.
These numbers come from the same system you build on. Start building and measure your own.