Skip to content
AGNT
All signals
Teardown·9 min

How semantic recall prevents stale context in agent conversations

Every non-trivial user message is embedded and searched against stored memory via pgvector cosine distance. Top 10 relevant facts are injected into the system prompt alongside 17 structural keys. Trivial messages skip recall to save cost.

Most chatbot memory systems are broken. They either load too much (blowing the context window on irrelevant history) or too little (forgetting the user's dietary restrictions mentioned three conversations ago).

AGNT's approach: split memory into structural keys and semantic facts. Structural keys are always loaded — diet, favorite areas, interests, last booking, typical party size, preferred booking time, fitness goal, daily calorie target. 17 keys total, enumerated in soul_loader.py.

Semantic facts are retrieved on-demand. When a non-trivial user message arrives, it is embedded via OpenAI's embedding API and searched against the user_memory table using pgvector cosine distance. Top 10 relevant facts are returned and injected into the system prompt.

Trivial messages — greetings, acknowledgments, emoji-only replies — skip semantic recall entirely. A regex pattern match at soul_loader.py catches these before the embed call happens. This saves approximately 30% on embedding spend versus recalling on every message.

The result: the agent always has the right context. It remembers that the user is gluten-free when they ask about pasta, even if the gluten-free fact was recorded months ago. But it does not waste tokens recalling that fact when the user just says 'hi'.

Soul prompt construction is cached in Redis for 1 hour and invalidated immediately on memory writes. Cache hit rate is approximately 70% under normal usage patterns.

Share this signal

Submit to

Public submit links. No API keys. Opens in a new tab with the title and URL pre-filled.

Copy and paste

Build on the same network.

Every signal comes from a system you can build on today.