Retell AI restaurant voice-agent tuning

Production tuning playbook for restaurant voice agents

A practical path for moving a real ordering agent from "works most of the time" to a testable, supportable system: config audit, prompt/function boundaries, fallback design, POS generalization, and regression evidence.

The tuning loop

Do not rewrite the whole agent first. Preserve the working backend, isolate the calls that fail, and tune against a repeatable set of test cases.

CaptureCollect transcripts, tool calls, latency, interruption points, menu items, and caller intent.
ClassifyTag each failure as STT, menu alias, prompt ambiguity, function schema, POS state, or escalation.
Move LogicPull deterministic rules out of the prompt and into functions, RPCs, or n8n routing where possible.
FallbackFor stuck orders, capture caller number, order intent, and send a restaurant callback task or SMS.
RegressReplay the same call set after each change so accuracy improves without breaking earlier wins.

Where the work usually is

Restaurant agents fail at boundaries: speech recognition, interruptions, menu complexity, and POS action state. Each boundary needs a different fix.

Retell config

  • Boosted keywords for menu items, modifiers, brands, and dialect aliases.
  • Endpointing and interruption sensitivity tuned for noisy callers.
  • Talk While Waiting only where wait-state speech is actually helpful.
  • Voice pacing checked against real ordering turns, not demo calls.

Prompt vs function

  • Keep conversation style and policy in the prompt.
  • Move item lookup, pagination, price, availability, and order mutation to functions.
  • Use strict function descriptions for required fields and failure states.
  • Return structured repair prompts when the backend rejects an order.

LLM drift

  • Separate fixable behavior from model variance.
  • Route high-risk decisions through backend validation.
  • Add clarification turns for ambiguous modifiers and mixed-language menu terms.
  • Escalate instead of letting the model improvise inventory or pricing.

Regression matrix

A production tuning sprint should leave a playbook, not only a better prompt. This matrix keeps future restaurant onboarding from becoming another custom rescue.

Case
What To Test
Done State
Clean order
Known caller, clear menu items, no unavailable modifiers.
Order enters POS, confirmation repeats item/modifier/price, log stores call ID.
Ambiguous menu item
Similar-sounding items, STT correction, Arabic alias or slang variant.
Agent asks one clarification, maps to canonical item ID, never guesses.
Caller frustration
Repeated interruption, "operator", "this is wrong", or long silence.
Captures number, summarizes attempted order, sends restaurant callback task/SMS.
POS failure
abcPOS, Toast, Square, or Clover API timeout/reject.
No fake confirmation; fallback task includes raw payload and rejection reason.

POS playbook shape

The second restaurant gets faster when the first implementation leaves stable interfaces instead of tribal memory.

Reusable interface

  • menu.search for item, modifier, and alias lookup.
  • order.validate before the agent confirms anything.
  • order.submit only after a confirmed, priced order object exists.
  • handoff.create for stuck, frustrated, or unsupported states.

Per-POS adapter notes

  • Authentication and environment variables.
  • Menu export cadence and cache invalidation.
  • Order field map with required and optional fields.
  • Known rejection codes and customer-safe recovery language.

Paid pilot

Best first slice: review 10-20 real calls, classify failures, tune the top two Retell/prompt/function boundaries, add one fallback path, and leave a POS onboarding playbook skeleton.