The Discipline of Prompt Engineering
- Design an orchestrator/sub-agent architecture for a multi-step agentic task with appropriate tool use constraints
- Implement a minimum viable memory architecture for a long-running agent using STM (rolling summarisation) and LTM (vector retrieval)
- Establish a team prompt practice with a documented library, review workflow, and ownership assignment
- Articulate how prompt engineering evolves in a world of reasoning models — what changes and what remains essential
From Technique to Discipline
Over the first 15 lessons of this course, you have accumulated a toolkit: reasoning techniques, few-shot strategies, persona design, pipeline architecture, evaluation frameworks, cost controls, and security patterns. A toolkit only becomes a discipline when it is applied systematically, measured rigorously, and maintained over time. This final lesson is about building the practice — how professionals manage prompts in teams, what agentic applications require beyond single-prompt techniques, and where the field is headed.
Agentic Prompting Patterns
Agentic AI systems — applications where models take actions, use tools, and operate over multiple steps — require a different set of prompting patterns than single-turn interactions. The context from earlier lessons (chains, tool use, memory) comes together here.
The supervisor/orchestrator pattern is the most widely deployed in production. A high-level orchestrator model receives the user's goal, breaks it into subtasks, and delegates each to specialised sub-agents. The orchestrator has access to the full task context and coordinates the results. Each sub-agent has a narrow, focused prompt — a specialist. This pattern preserves accountability (every decision is traceable through the orchestrator) and is easier to debug than decentralised agent networks.
Tool use design is the most failure-prone aspect of agentic systems. Tool definitions (function calling schemas) are effectively part of the prompt — they must be precise, with clear descriptions of what each tool does and when to use it. As the number of available tools grows, tool selection accuracy degrades — models must choose among increasingly similar options. The solution is to provide only the tools relevant to the current task, not the full tool library. This is the agentic equivalent of context curation: give the agent exactly what it needs, nothing more.
Context accumulation is the most underestimated challenge in long-running agents. As an agent works through a multi-step task, the context window accumulates intermediate results, tool outputs, and reasoning traces. Without management, this accumulation eventually fills the context window or degrades attention quality. Two strategies: periodic summarisation (compress completed steps into a concise summary, clear the detailed history), or structured memory (extract key facts and decisions into a persistent memory object, discard the raw trace).
Memory Architecture for Long-Running Agents
Research from 2025 on agent memory systems (A-MEM, AgeMem) distinguishes two types of memory that require different management:
- Short-term memory (STM): The current context window — what the agent is actively working with. Managed through pruning and summarisation to prevent overflow and maintain relevance.
- Long-term memory (LTM): Persistent storage of facts, preferences, and decisions across sessions. Retrieved selectively into STM when relevant. Implemented via vector stores, key-value stores, or structured databases.
A-MEM (Agentic Memory, 2025) applies Zettelkasten principles to agent memory — each memory is a node with connections to related memories, forming an interconnected knowledge network rather than a flat list. This structure enables more precise retrieval (finding not just the directly relevant memory, but the related context around it) without flooding the context window with everything potentially relevant.
For most production agentic applications, the minimum viable memory architecture is: STM managed through rolling summarisation + LTM as a simple vector store with semantic retrieval. Start there before investing in more sophisticated architectures.
Building a Team Prompt Practice
Prompts in professional teams are shared assets, not individual property. Managing them well requires the same infrastructure as managing code:
- Centralised prompt library: A version-controlled repository where all production prompts are stored, documented, and accessible to the team. Prompts without documentation and ownership are prompts that will break in production with no one to fix them.
- Documentation standard: Each prompt in the library should document its purpose (what it accomplishes), context (what application it powers), expected output (format and characteristics), known limitations (edge cases, failure modes), and performance benchmarks (eval suite scores).
- Review workflow: Prompt changes should go through a pull request and review process, identical to code changes. The review checklist should include: does the eval suite pass? Has the prompt been tested against adversarial inputs? Are all constraints stated positively?
- Governance: Teams that assign ownership for prompt files — a named person who is responsible for the prompt's quality and maintenance — report fewer production incidents from prompt drift than teams where prompts are unowned shared resources.
Microsoft's 2025 Work Trend Index found that teams with structured AI workflows outperform individual AI users by 43% in complex problem-solving. The multiplier effect of a well-maintained shared prompt library compounds over time.
Is Prompt Engineering a Long-Term Skill?
The direct answer is yes — with qualification. The prompt engineering market is valued at $505 million in 2025 and projected to grow to $6.7 billion by 2034 (CAGR 33.27%). LinkedIn reports a 434% increase in prompt engineering job postings since 2023. Certified practitioners earn 27% higher wages than comparable roles.
The qualification: the nature of the skill is shifting. Reasoning models (o1, o3, Claude Extended Thinking) change what you prompt for, not whether prompting is necessary. With a reasoning model, you are no longer engineering the reasoning chain — the model manages that. You are engineering the problem specification: defining constraints, inputs, outputs, and success criteria with enough precision that the model's internal reasoning can find the right path. This is a higher-level skill, not a disappearing one.
The cases for concern — "LLMs will automatically generate prompts and make the skill obsolete" — misunderstand the current state. DSPy and similar tools automate the optimisation of specific prompt components given a labelled dataset and a metric. They do not replace the human judgment required to define what the task is, what good output looks like, and what constraints matter. The automation handles the hill-climbing; the human defines the hill.
What the Best Practitioners Do Differently
To close this course with something concrete: what distinguishes the best prompt engineers from adequate ones is not knowledge of more techniques. It is three habits:
- They measure everything. Every prompt change is tested against an eval suite before deployment. They know the current quality score of every prompt they own.
- They think in distributions, not examples. A prompt is not evaluated on whether it produces a good output for one input. It is evaluated on whether it produces good outputs across the full distribution of real inputs it will encounter. One good example proves nothing; a passing eval suite proves something.
- They build for the failure case. They design prompts assuming adversarial inputs, edge cases, and unexpected user behaviour — not just the happy path. The most robust prompts are written by people who have already thought about how they will break.
The techniques in this course are tools. These three habits are what make those tools add up to a discipline.
- The supervisor/orchestrator pattern is the most reliable agentic architecture — centralised coordination preserves accountability and makes debugging tractable
- Provide only tools relevant to the current task, not the full library — tool selection accuracy degrades as the number of available tools grows
- Minimum viable agent memory: rolling summarisation for STM + semantic vector retrieval for LTM — start simple before investing in sophisticated architectures
- Prompt engineering market: $505M in 2025, projected $6.7B by 2034; 434% job posting growth; reasoning models change what you prompt for, not whether prompting matters
- The three habits of elite practitioners: measure everything with eval suites, think in distributions not individual examples, and design for the failure case from the start