Route calorie scanning to Gemini Vision via AGNT
AGNT's calorie scanner can use Gemini's multimodal API as the vision backbone — here's how to configure the fallback chain.
AGNT's calorie scan feature sends food photos to an LLM for nutrition analysis. By default it uses Claude. This guide adds Gemini Vision as a fallback or primary provider in AGNT's LLM gateway, giving you multi-model resilience and cost flexibility for vision workloads.
Prerequisites
- AGNT backend deployed locally.
- Gemini API key from Google AI Studio.
- A test food photo.
What you're building
AGNT's calorie scan pipeline works like this: user sends a food photo → the backend passes it to an LLM with a structured prompt → the LLM returns a JSON nutrition breakdown (calories, protein, carbs, fat, fiber) → AGNT stores the result in the user's food diary. The LLM call is the critical step, and right now it's hardcoded to Claude.
After this guide, the pipeline will support multiple vision providers through AGNT's `llm_gateway`. You'll configure Gemini Vision as either the primary provider, a failover target when Claude is down, or a cost-optimized route for high-volume scanning. The user never knows which model processed their photo — the gateway abstracts the provider completely.
Step 1 — Get a Gemini API key
Go to [Google AI Studio](https://aistudio.google.com/) and sign in with your Google account. Navigate to the API keys section and create a new key. Make sure the Gemini Pro Vision model is enabled for the key — this is the multimodal model that handles image inputs.
Copy the key and store it securely. You'll add it to AGNT's environment in the next step. The free tier gives you 60 requests per minute, which is plenty for development. For production workloads, check Google's pricing — Gemini Vision is significantly cheaper per image than most alternatives.
Step 2 — Configure AGNT's LLM gateway
AGNT's `llm_gateway` (in `agnt-backend/app/core/llm_gateway.py`) supports multiple providers behind a unified interface. Each provider implements `complete_with_vision(image_bytes, prompt) → structured JSON`. Add your Gemini key to the backend's `.env`:
GEMINI_API_KEY=your_key_here
GEMINI_MODEL=gemini-2.5-flashThen register Gemini as a provider in the gateway config. The gateway reads from `app/config.py` — add a new entry to the `LLM_PROVIDERS` dict:
LLM_PROVIDERS = {
"claude": {
"adapter": "anthropic",
"model": settings.CLAUDE_MODEL,
"api_key": settings.ANTHROPIC_API_KEY,
"priority": 1,
},
"gemini": {
"adapter": "google",
"model": settings.GEMINI_MODEL,
"api_key": settings.GEMINI_API_KEY,
"priority": 2,
},
}The `priority` field controls the default order. Priority 1 is tried first. If it fails (timeout, rate limit, 5xx), the gateway falls through to priority 2.
Step 3 — Set routing rules
The gateway supports three routing modes, controlled by the `LLM_ROUTING_MODE` env var:
- **`primary`** — Always use the highest-priority provider. Gemini never runs unless Claude is unreachable. Set `LLM_ROUTING_MODE=primary` (this is the default).
- **`failover`** — Try the primary provider. If it returns an error or times out within 10 seconds, automatically retry with the next provider. Set `LLM_ROUTING_MODE=failover`. This is the recommended mode for production — you get Claude quality by default with Gemini as a safety net.
- **`cost_optimized`** — Route based on estimated cost per request. Vision requests with images under 1MB go to Gemini (cheaper). Larger or more complex requests go to Claude. Set `LLM_ROUTING_MODE=cost_optimized`.
For calorie scanning specifically, you can also override the provider per-tool. In `tool_executor.py`, the `calorie_scan` tool accepts an optional `provider` field that bypasses the global routing mode. This lets you A/B test providers without changing the global config.
Step 4 — Test with a food photo
Send a test scan through the API:
curl -X POST https://localhost:8000/api/calorie-scan \
-H "Authorization: Bearer $AGNT_TOKEN" \
-F "image=@test-food.jpg"The response includes a `provider` field showing which model processed the image. Verify Gemini handles it by temporarily setting its priority to 1, or by using the `?provider=gemini` query parameter to force routing.
Compare the output between Claude and Gemini for the same image. Both should return the same JSON shape (`{calories, protein_g, carbs_g, fat_g, fiber_g, items: [...]}`) because the gateway normalizes the response format. The actual numbers may differ slightly — each model has its own estimation biases. For most food photos, the variance is under 10%.
Why this matters
Multi-model routing means no single-provider dependency. If Claude has an outage at dinner time — peak calorie scanning hours — your users don't notice because Gemini picks up the load automatically. The gateway handles the failover, logs the switch, and the food diary entry looks identical.
Gemini Vision also excels at certain food categories. In our testing, it's notably better at Southeast Asian dishes (nasi goreng, rendang, soto) where the visual complexity is high and Western-trained models sometimes misidentify ingredients. Running both providers and comparing gives you data on where each model is strongest.
The deeper principle: AGNT's gateway abstracts the provider. The calorie scan tool doesn't know or care which model answered. The user's food diary doesn't know. The daily summary doesn't know. This is the right abstraction boundary — model choice is an infrastructure decision, not a product decision.