AGNT's semantic venue search runs on pgvector with cosine distance over 1536-dimensional OpenAI embeddings. Until this week, we used IVFFlat indexing with 100 lists — the default recommendation for datasets under 1 million rows. Our venue corpus is well under that threshold at ~4,000 embedded chunks across 149 venues.

The problem: recall on our internal benchmark (200 hand-labeled venue queries with expected top-5 results) was 74%. Users searching for 'quiet brunch spot in Canggu with good coffee' would get relevant results, but the top-5 missed obvious matches that appeared in top-20. IVFFlat's approximate nature was costing us real relevance.

The fix: we migrated from IVFFlat to HNSW (Hierarchical Navigable Small World) indexing. HNSW builds a multi-layer graph during index construction that enables more accurate approximate nearest-neighbor search at query time. We set ef_construction to 200 (up from IVFFlat's equivalent) and m to 16 (number of bi-directional links per node). The index build took 47 seconds on our 4,000-chunk corpus.

Results: recall on the same benchmark improved from 74% to 86%. The gains were concentrated in ambiguous, multi-attribute queries — exactly the type of natural language queries our users send. Single-attribute queries ('sushi in Seminyak') were already at 95%+ recall and showed minimal change. Query latency increased from 8ms to 14ms median, which is invisible to users given the overall response pipeline runs 800-2000ms.

The HNSW index is now live in production. We set ef_search to 100 at query time, which gives us the 86% recall level. Increasing ef_search to 200 pushes recall to 89% but doubles query latency to 28ms — we are holding at 100 for now and will revisit when the corpus grows past 10,000 chunks.