Generative AI has moved CX from chatbots to autonomous agents that resolve over a third of tier‑1 inquiries on their own. Behind the hype is a practical question every CX leader has to answer: which type of language model belongs in your stack?
Large language models (LLMs) like GPT‑4 and Claude offer broad knowledge and deep reasoning — hundreds of billions of parameters trained on trillions of tokens. Small language models (SLMs) compress knowledge into millions to a few billion parameters and run efficiently on edge servers, laptops, or modest GPUs. The right answer is rarely 'pick one.' It's 'right‑size each task.'
Understanding the trade‑offs
Parameter size and training data
LLMs are trained on internet‑scale corpora. SLMs use curated, domain‑specific datasets and techniques like quantization, pruning, and distillation. ModelOp research shows specialized SLMs can be created for a few thousand dollars and run on standard desktop hardware with 4 GB of GPU memory. That economics changes what's possible at the edge.
Latency, cost and throughput
Accuracy and hallucination
LLMs win on open‑ended reasoning, creative summarization, and cross‑domain synthesis. SLMs win on narrow tasks where they've been distilled or fine‑tuned: intent classification, slot filling, structured summarization, ticket triage. For deterministic CX workflows, a tuned SLM frequently outperforms a frontier LLM and costs two orders of magnitude less.
A decision tree for CX leaders
- Is the task narrow and high‑volume (FAQ, triage, summarization, sentiment)? → Start with an SLM
- Does it require multi‑step reasoning or open‑ended generation? → LLM, with guardrails and a confidence threshold
- Does data residency or on‑prem deployment matter (regulated industries)? → SLM, possibly hosted in your VPC
- Is the volume small but the task complex? → LLM, with caching and a fallback model
- Hybrid? Almost always. Route easy traffic to the SLM, escalate the long tail to an LLM
The Operator's Guide to Branded AI Tools
When to buy, when to build, and when to white‑label SLMs and LLMs inside the enterprise — 30 pages.
Implementation and governance
- Pick a model registry and version every deployed model — including the SLMs
- Instrument confidence scores end‑to‑end so routing decisions are explainable
- Apply redaction at ingest for both SLM and LLM paths; PCI/HIPAA scope doesn't shrink because the model is small
- Budget per‑intent, not per‑model — the question is dollars per resolved contact, not dollars per token
- Plan model refresh cycles. SLMs decay as your product changes; tune quarterly
Right‑size the model, not the ambition
Chasing the biggest model is a hobby. Right‑sizing is the operating discipline. The teams winning with generative AI in CX are the ones who match each task to the smallest model that meets the quality bar — and reserve the frontier models for the problems that genuinely need them.
Operator perspective in your inbox
Monthly. No filler. Just what enterprise CX leaders need to know.

