Building AI Features — Common Patterns
- Identify the eight common AI feature patterns and match each to a product requirement
- Describe the four components of a RAG pipeline and where failures most commonly occur
- Choose the right implementation approach for classification, extraction, and chatbot features
- Apply the 'when not to use AI' test before building any AI-powered feature
Eight Patterns That Cover Most AI Features
The AI feature space looks vast, but most of what developers actually build in production applications is a variation of one of eight patterns. Recognising the pattern first determines the architecture, the implementation decisions, and the failure modes you need to plan for. Here is a map of all eight.
Pattern 1: Text Summarisation
Condense documents, emails, tickets, or meeting transcripts into shorter, structured output. The implementation is straightforward: include the source document in the user message, define the desired format in the system prompt (bullet points, max 200 words, retain all action items, etc.). Low hallucination risk when the source is in the prompt — the model is constrained to what you gave it. Key challenge: documents that exceed the context window. The solution is chunking with recursive summarisation: summarise each chunk, then summarise the summaries.
Pattern 2: Classification and Routing
Categorise incoming text: triage support tickets, label content, detect sentiment, score leads. Use structured output (Pattern from Lesson 10) with temperature set to zero. Define your categories as an explicit enum in the system prompt. Include 2–3 few-shot examples per category. Reliability is high when the categories are well-defined and the inputs are within distribution. Important caveat: if your categories are well-defined and the input is simple, a fine-tuned smaller model or even traditional regex may be dramatically cheaper and faster. Do not use a frontier LLM for keyword-based routing.
Pattern 3: Entity and Data Extraction
Extract structured data from unstructured text: pull customer name, order ID, and issue type from a support email; extract dates, parties, and obligations from a contract; identify product mentions from user reviews. Define a JSON schema with the fields you want, use native structured output, and set temperature to zero. Highly reliable when the schema is clear and the target fields are unambiguous in the source text. Reliability drops when the desired fields require inference or are genuinely ambiguous.
Pattern 4: Retrieval-Augmented Generation (RAG)
RAG is the answer when users need to query your private knowledge base — product documentation, internal wikis, contracts, support history. It is the most architecturally complex of the eight patterns and the most commonly misimplemented. The pipeline has four components:
- Ingestion and embedding: Load your documents, chunk them into pieces (256–1024 tokens with 10–20% overlap), embed each chunk using an embedding model (OpenAI text-embedding-3-large or open-source bge-m3), and store the chunk text and its vector in a vector database.
- Vector store: Stores vectors and enables fast similarity search. The pragmatic starting point for most teams already running Postgres is pgvector — a Postgres extension that adds vector search with no new infrastructure. Managed options like Pinecone are easier to operate at scale.
- Retrieval: Embed the user's query with the same embedding model, run a nearest-neighbour search, return the top-k most relevant chunks (typically 3–10).
- Generation: Build a prompt with your system instructions, the retrieved chunks as context, and the user's question. Instruct the model explicitly: "Answer only based on the provided context. If the answer is not in the context, say you do not know." This grounding instruction is what prevents the model from hallucinating confident answers when the relevant information is not present.
The most common RAG failure is not in the generation step — it is in the retrieval step. Wrong chunks returned means wrong context, which means wrong answers that sound authoritative. If RAG is not performing well, start debugging in retrieval before looking at the LLM.
Pattern 5: Conversational Chatbot
Stateful multi-turn conversation. Implementation: maintain a messages array in your backend, append each user message and model response, send the full history with each request. Key consideration: the messages array grows unboundedly. You need a truncation or summarisation strategy once conversations grow long — typically summarising older turns into a compressed form and keeping the recent turns in full. Without this, long conversations become expensive and quality degrades.
Patterns 6, 7, and 8: Code Generation, Content Moderation, Semantic Search
Code generation tools (developer tooling, SQL generators, config builders) work well with Claude or OpenAI models. Always add a second validation pass before any auto-execution — never run AI-generated code, SQL, or shell commands without review. Content moderation works but has economics to consider: frontier LLMs are expensive per call. At scale, purpose-built moderation APIs (OpenAI Moderation, Anthropic's built-in safety layers) or fine-tuned smaller models are often 10–50x cheaper for the same task. Semantic search (find content by meaning, not keywords) uses the same embedding + vector search infrastructure as RAG but without the generative step — just return the most similar documents or items.
When Not to Use AI
The most underrated AI feature decision is recognising when not to build an AI feature. AI adds latency, cost, and non-determinism to any feature it powers. Before reaching for the API, ask: is this fundamentally a language understanding problem? If the answer is no — if you are doing a deterministic lookup, structured filtering, arithmetic, real-time data retrieval, or pathfinding — a database query, an algorithm, or an existing API will be faster, cheaper, and more reliable. One engineering team famously replaced an LLM-based location routing attempt with a Places API call and a shortest-path algorithm, dramatically reducing both cost and latency. AI is not the right hammer for every nail.
- Most production AI features are variations of eight patterns: summarisation, classification, extraction, RAG, chatbot, code generation, moderation, and semantic search
- RAG failures most often occur at the retrieval step, not the generation step — debug retrieval first
- The pragmatic first vector store choice is pgvector — no new infrastructure if you already run Postgres
- Always instruct generative models to 'say I don't know' when the answer is not in the provided context
- If a feature is not fundamentally a language understanding problem, a database or algorithm is faster, cheaper, and more reliable than AI