← Back to blog

Building an AI Customer Support Platform: Architecture, Escalation, and Knowledge Base Design

How I designed a production SaaS support platform with AI-first chat, knowledge base ingestion, ticket escalation, and real-time agent handoff — using Next.js, Node.js, PostgreSQL, Redis, and OpenAI on AWS.

6/12/2026

Why most support bots fail in production

Businesses adopt AI chatbots expecting instant cost savings. What they often get instead:

  • Bots that hallucinate answers not in company policy
  • No path to a human agent when the AI is wrong
  • Knowledge bases that are uploaded once and never updated
  • Dashboards with chat volume but no actionable metrics

The gap is not the LLM — it is the system around it: data pipeline, escalation workflow, roles, and ops tooling.

This post walks through how I architected an AI-powered customer support platform that handles real support volume — not a demo widget.

See the full case study: /projects/ai-customer-support


The business problem

Support teams at scale face a predictable pattern:

  1. 80% of inquiries are repetitive — shipping status, refunds, account access, pricing
  2. Response time grows linearly with ticket volume unless headcount grows too
  3. Context is lost when conversations move between channels or agents
  4. Complex cases still need humans — billing disputes, edge-case bugs, angry customers

The goal was not to replace support staff. It was to automate the repetitive layer while giving agents a clean escalation path with full conversation history.


High-level architecture

The platform follows a modular SaaS layout:

Customer

Web chat

Frontend

Next.js

API

Node.js

Data

PostgreSQL + Redis

AI

OpenAI API

Customer chat flows through the web app and API into PostgreSQL, Redis, and OpenAI
LayerTechnologyResponsibility
FrontendNext.jsCustomer chat, agent dashboard, admin panels
APINode.jsAuth, conversations, tickets, AI orchestration
Primary DBPostgreSQLUsers, roles, tickets, knowledge docs, conversation logs
Cache / sessionsRedisActive session state, rate limits, pub/sub for real-time
AIOpenAI APIChat completions grounded on knowledge base context
InfrastructureDocker + AWSContainerized deploy, scalable hosting

This separation keeps the AI layer replaceable — swap models or providers without rewriting the product core.


Core product modules

Based on what shipped in production:

1. AI chat interface

The customer-facing chat is the entry point for every support interaction.

Design priorities:

  • Low friction — no account required for basic questions (configurable per tenant)
  • Conversation persistence — history stored in PostgreSQL, not just in browser memory
  • Clear escalation affordance — "Talk to a human" always visible when AI is active
  • Source grounding — answers should reflect uploaded docs and FAQs, not general web knowledge

Unlike a portfolio assistant (structured CMS context), a support platform needs document retrieval because knowledge lives in PDFs, help articles, and internal wikis.

AI chat interface

2. Knowledge base and document management

Admins upload company documentation that the AI uses as grounding context.

What the upload flow handles:

  • Document storage and metadata (title, category, last updated)
  • Text extraction from common formats
  • Chunking for retrieval (split long docs into searchable segments)
  • Version awareness — stale docs are a top cause of wrong AI answers

Lesson learned: The upload UI is not a nice-to-have. If updating the knowledge base is hard, teams stop doing it and the bot degrades silently.

Knowledge base upload

3. Ticket creation and escalation workflow

When the AI cannot resolve an issue — or the customer asks for a human — the conversation promotes to a support ticket.

Customer message

Chat UI

AI attempt

Grounded reply

Escalation?

Rules / request

Ticket + summary

PostgreSQL

Agent inbox

Real-time handoff

Escalation preserves full context and generates an AI summary for the agent

Escalation triggers:

  • Explicit user request ("I want to speak to someone")
  • AI confidence threshold (optional — flag low-confidence replies)
  • Keyword / category match (billing, legal, outage)
  • Repeated failed resolution on the same topic

Ticket payload includes:

  • Full conversation transcript
  • Customer identity (if authenticated)
  • AI summary of the issue (generated at escalation time)
  • Priority and category tags

Agents start with context instead of asking "can you describe your problem again?"

Support ticket management

4. Real-time messaging for agents

After escalation, agents work in a live messaging interface — not a static ticket queue with email-style replies.

Redis supports:

  • Active session tracking
  • Pub/sub or similar pattern for near-real-time message delivery
  • Short-lived cache for "who is online" agent status

PostgreSQL remains the source of truth for message history; Redis handles the hot path.

5. Admin dashboard, analytics, and team management

Operations need visibility, not just chat:

  • Volume metrics — conversations started, AI-resolved vs escalated
  • Resolution rate — % handled without human intervention
  • Response times — first reply, time to escalation, time to close
  • Team management — roles (admin, agent, viewer), permissions, assignment rules

Role-based access is non-negotiable in multi-tenant SaaS: agents should not see billing settings; admins should not need to impersonate customers to read logs.

Analytics dashboard

Team management


Data model (conceptual)

A simplified schema that supports the workflows above:

Users and roles

users id, email, role (admin | agent | customer), tenant_id, created_at

tenants id, name, plan, settings (jsonb)

Multi-tenant from day one — even if v1 only serves one client, the schema should not require a migration later.

Conversations and messages

conversations id, tenant_id, customer_id, status (ai | escalated | closed), created_at

messages id, conversation_id, role (user | assistant | agent), content, created_at

status drives UI state: AI mode shows the bot; escalated mode routes to agent inbox.

Knowledge base

documents id, tenant_id, title, file_url, status (processing | ready | error), updated_at

document_chunks id, document_id, content, embedding (optional), chunk_index

If using vector search: store embeddings per chunk. If using simpler retrieval: full-text search on content may suffice for smaller bases.

Tickets

tickets id, conversation_id, assigned_agent_id, priority, category, status, ai_summary, created_at

Link tickets to conversations — never duplicate the transcript in a separate silo.


AI pipeline: from question to grounded answer

When a customer sends a message, the API runs a pipeline like this:

1Load history

PostgreSQL

2Retrieve chunks

Knowledge base

3Build prompt

Policies + context

4Call OpenAI

Chat completion

5Save reply

PostgreSQL

6Check escalation

Rules + signals

7Return response

Customer chat

Each customer message triggers retrieval, generation, persistence, and escalation checks

System prompt principles for support (not portfolio)

RuleWhy
Answer only from provided contextPrevents policy hallucinations
Cite when possible ("According to our refund policy…")Builds customer trust
Escalate on billing, legal, abuseRisk categories should never be AI-only
Never promise refunds/compensation unless in docsLiability protection
Keep replies conciseSupport chat is not an essay

Model and cost choices

  • Default model: cost-efficient tier (e.g. gpt-4o-mini) for high-volume first-line support
  • Upgrade path: larger model for complex escalations or AI-generated ticket summaries
  • Token budgeting: cap retrieved context size; truncate history to last 10–20 messages
  • Caching: identical FAQ questions can hit Redis cache for 5–15 minutes

RAG vs. full-context: what this project needed

My portfolio assistant uses structured CMS context — no vector DB.

A customer support platform is different:

FactorPortfolio assistantSupport platform
Data shapeStructured (projects, services)Unstructured (PDFs, long help articles)
VolumeDozens of recordsHundreds/thousands of document chunks
Update frequencyAdmin edits CMSDocs uploaded weekly
Wrong answer costEmbarrassingRefund disputes, churn, legal risk

This project needed retrieval — search relevant chunks before each reply, not stuff every document into the system prompt.

Practical RAG stack options:

  1. PostgreSQL full-text search — good for MVP, no extra infra
  2. pgvector in PostgreSQL — embeddings in the same DB you already run
  3. Dedicated vector DB — worth it at very large scale; overkill for most SMB support bases

Start with (1) or (2). Add (3) when chunk count or query latency forces it.


Escalation: the feature clients care about most

AI resolution rate is a vanity metric if escalation feels broken.

What "good escalation" looks like:

  1. One click — customer never hunts for a contact form
  2. No context loss — agent sees every AI message
  3. AI summary — 2–3 sentence briefing generated at handoff
  4. SLA visibility — ticket enters queue with priority and timestamp
  5. Closed loop — when agent resolves, conversation status updates; customer gets confirmation

Bad escalation — "Please email support@…" after a 10-message AI thread — destroys the ROI of the bot.


Security and multi-tenancy basics

Production support platforms handle sensitive data. Minimum bar:

  • Tenant isolation — every query scoped by tenant_id
  • Auth on admin/agent routes — JWT or session with role checks
  • Input sanitization on public chat endpoints (message length, rate limits)
  • Document access control — tenant A's uploads never appear in tenant B's retrieval
  • Audit logging — who viewed/exported conversation data
  • Secrets in env — OpenAI keys per environment, never in client bundles

Deployment on AWS with Docker

Typical container layout:

ContainerRole
webNext.js frontend
apiNode.js API
postgresPrimary database (or RDS managed)
redisCache and real-time
worker (optional)Async doc processing, embedding jobs

Document processing (PDF extract → chunk → embed) belongs in a background worker, not the request path — uploads should return quickly with status: processing.


Results and what changed for the client

After deployment, the platform enabled:

  • Higher throughput — common questions resolved without agent time
  • Faster first response — AI replies in seconds, 24/7
  • Agent focus — humans handle disputes, bugs, and edge cases only
  • Measurable ops — analytics show where knowledge base gaps exist

The win is operational: support scales sub-linearly with customer growth.


Trade-offs and honest limitations

What v1 intentionally did not solve:

  1. Voice / phone channel — text chat only
  2. Multi-language — English-first; i18n is a layer on top
  3. Automatic ticket routing ML — simple round-robin or manual assignment first
  4. Deep CRM integration — Salesforce/HubSpot connectors are phase 2
  5. Fine-tuned models — prompt + RAG was sufficient; fine-tuning adds ops burden

Key takeaways

  1. Support AI is a product problem, not an API call — escalation, roles, and analytics matter as much as the model
  2. Knowledge base UX determines bot quality — if updates are painful, accuracy decays
  3. Use RAG when docs are unstructured — structured CMS context is not enough here
  4. PostgreSQL + Redis covers most SaaS needs before adding exotic infra
  5. Measure AI-resolved vs escalated — that ratio tells you whether to improve docs, prompts, or staffing

Explore the project

Building an AI support layer for your product? Get in touch or find me on Upwork.