Tarmac — a travel LLM that runs the booking.
One model that delivers accuracy, orchestration, governance, and execution. Not a chatbot that talks about travel — an AI that runs it, correctly, end to end.
The four things it does
One model. Four jobs.
Accuracy
Real fares, real availability, real fare rules. Not hallucinated guesses. The line between a chatbot and a system you can put a credit card into.
Orchestration
Coordinates the full lifecycle as one coherent flow — search, pricing, booking, ticketing, exchange, settlement, lodging.
Governance
Every action stays inside the rules — corporate policy, fare and ticketing rules, compliance. Nothing executes outside the lines.
Execution
Actually completes the transaction. Books, tickets, pays — not just suggests. Where a wrong answer costs real money.
What it delivers
From hallucination machine to infrastructure.
Turns any AI from a hallucination machine into an accurate one
The line between a chatbot and a system you can put a credit card into.
Transaction-grade reliability
Take payment. Issue tickets. End-to-end completion, not a suggestion to forward to a human.
Enterprise-safe by construction
Trained on owned synthetic and licensed data. Never on customer bookings or PII.
Any AI can talk about a trip. Tarmac is the model that can run one — accurately, within policy, all the way to booked, ticketed, and paid.
Takes travel AI from a demo to infrastructure.
Travel-tuned LLM bakeoff · v0.5
Tarmac vs. the frontier. On travel, it's not close.
FILL Mode (Schema Provided)
| Model | Task | Contract | Intent | Halluc. |
|---|---|---|---|---|
| Tarmac v0.5 | 93.1% | 94.1% | 96.7% | 0.47% |
| GPT-5.4 | 91.1% | 89.1% | 96.2% | 1.97% |
| Claude Opus 4.7 | 59.3% | 64.3% | 71.6% | 5.65% |
| Llama 3.3 70B | 71.6% | 75.6% | 95.8% | 9.4% |
| DeepSeek V4 Pro | 79.4% | 79.4% | 86.5% | 2.66% |
| Gemini 3.1 Pro | 14.5% | 14.5% | 15.5% | 1.97% |
When OTAIP tells the model what to do
Lab compares apples to apples.
RAW Mode (Production Conditions)
| Model | Contract Correctness |
|---|---|
| Tarmac v0.5(NATIVE) | 94.1% |
| GPT-5.4 | 1.4% |
| Claude Opus 4.7 | 0.4% |
| Llama 3.3 70B | 0.0% |
| DeepSeek V4 Pro | 0.2% |
| Gemini 3.1 Pro | 0.0% |
When models try to figure it out themselves
Real world compares apples and oranges.
In the real world of travel, Tarmac works. Under 10B yet production-ready. OTAIP prevents even 0.47% hallucination as the final gate. Other models return garbage — no domain expertise to guide them. That's the wall between LLMs and real travel bookings we must climb over.
No pricing page. Just a conversation.
Tarmac is sold to enterprises that need a travel-safe LLM with their own deployment, evals, and compliance posture. Tell us what you're building and we'll show you what fits.