Skip to content
AGNT

Backend · Features

Business scan engine.

The scan engine that powers B2B onboarding. Given a URL or a business name, it runs five lanes in parallel — website, classify, reputation, performance, social — merges the results into a single ScanReport, and hands it to the scoring module which emits scores, opportunities, and insights. Source: agnt-backend/app/features/scan/.

What it is

When a venue owner starts onboarding, they type their website URL or business name into the scan wizard at /for-businesses/scan. Behind that form, the scan engine launches five parallel data-gathering lanes, waits for the fast ones to complete, streams the slower ones in as they finish, and produces a structured ScanReport the onboarding UI uses to pre-fill the venue profile.

Directory layout

  • orchestrator.py— top-level entry point. Runs the lanes in parallel, merges the results, and yields streaming updates.
  • models.py — pydantic data contracts for each lane: WebsiteScan, CompanyProfile, Reputation, PerformanceSnapshot, SocialPresence, ScanReport, etc.
  • scoring.py— reads the merged report and emits category scores, opportunities, and insights.
  • templates.py— per-industry opportunity templates (restaurant, spa, hotel, surf school, …).
  • summary.py— narrative summary generation for the UI.
  • lanes/
    • website.py— Firecrawl or httpx + BS4 scrape of the business website
    • classify.py— industry / sub-category classification via the LLM gateway
    • reputation.py— TripAdvisor / Google reviews aggregation via Tavily
    • performance.py— Lighthouse performance snapshot
    • social.py— Instagram, TikTok, Facebook presence lookup

The five lanes

LaneSourcePurposeTypical latency
WebsiteFirecrawl or httpx + BeautifulSoupScrape title, description, contact info, menu links.~2–5s
ClassifyLLM gatewayInfer category, sub-category, cuisine type, target audience from the scraped page.~1–2s
ReputationTavily searchAggregate TripAdvisor, Google, and local review signals into a single score.~3–6s
PerformanceLighthouse APIMeasure Core Web Vitals and mobile performance for the landing page.~8–15s
SocialDirect scrapes + open APIsDiscover Instagram, TikTok, Facebook handles and approximate follower counts.~3–8s

Streaming results to the UI

The orchestrator is an async generator. It yields partial ScanReport objects as each lane finishes so the onboarding UI can progressively reveal sections instead of showing a spinner for 15 seconds. The UI subscribes via server-sent events and updates the form state as each chunk arrives.

pythonapp/features/scan/orchestrator.py (excerpt)
"""Orchestrator — runs all 5 lanes in parallel, merges into ScanReport."""

from app.features.scan.models import (
    CompanyProfile,
    ContactPoints,
    PerformanceSnapshot,
    Reputation,
    ScanReport,
    SocialPresence,
    SocialProfile,
    TrustSecurity,
    WebsiteScan,
)
from app.features.scan.lanes.website import scan_website
from app.features.scan.scoring import (
    compute_scores,
    filter_opportunities_by_template,
    generate_insights,
    generate_opportunities,
)
from app.features.scan.templates import get_template


def _import_classify():
    from app.features.scan.lanes.classify import classify_business
    return classify_business


def _import_reputation():
    from app.features.scan.lanes.reputation import scan_reputation
    return scan_reputation


def _import_performance():
    from app.features.scan.lanes.performance import scan_performance
    return scan_performance


def _import_social():
    from app.features.scan.lanes.social import scan_social
    return scan_social

Input resolution

The orchestrator accepts either a URL or a business name. When given a name, it uses Tavily to find the official website first, then proceeds to scan the resolved URL. The resolved URL is echoed in the final report so the UI can confirm "We scanned waterbombali.com" rather than the raw input.

Scoring and opportunities

Once the lanes have merged, scoring.compute_scoresemits category scores between 0–100 for web presence, reputation, performance, and social reach. scoring.generate_opportunitiesthen reads the scores against a per-industry template and produces a ranked list of actionable opportunities ("Add opening hours to your website", "Claim your Google Business Profile", etc.). Insights are short, narrative observations generated by scoring.generate_insights.

Failure modes

  • Website unreachable— the website lane returns an empty WebsiteScan and the classifier falls back to the raw business name.
  • Tavily rate limited— the reputation and social lanes degrade to "unknown" and the final report marks the category as unscored.
  • Lighthouse timeout— the performance lane returns a placeholder score and the UI surfaces the timeout so the user can retry.
  • LLM classification fails— the classifier returns a best-effort category derived from the URL slug and flags the field as low-confidence so the UI asks the user to confirm.

Every lane is wrapped in its own try / exceptso one lane's failure never brings down the whole scan. The orchestrator merges whatever it gets.

Related