Self-Refinement and Critique Loops

Advanced 🕐 22 min Lesson 12 of 16

What you'll learn

Implement the three-role self-refinement loop with effective critique criteria for a given task type
Determine the appropriate number of refinement cycles and when diminishing returns make additional cycles counterproductive
Apply Chain-of-Verification to reduce hallucination in factual responses
Design a constitutional prompting system that silently evaluates outputs against defined principles

The Model as Its Own Editor

A powerful insight from 2023 research: the same model that produced a mediocre output often knows the output is mediocre if you ask it the right way. Models have learned quality criteria from the vast amount of human-written text they were trained on — they can recognise unclear writing, inefficient code, and weak arguments. Self-refinement techniques exploit this by making the model its own editor.

Self-Refine (Madaan et al., 2023) established the formal framework. The technique assigns the model three sequential roles in a loop: generator, critic, and refiner. The same model plays all three roles, using its own evaluation of the previous output as the instruction for improvement. No external feedback, no additional training, no human-in-the-loop (unless you want one).

The Three-Role Loop

Each cycle of self-refinement consists of three phases:

Phase 1: Generate
Produce the initial output. This is a standard prompt for your task.

Phase 2: Critique
Pass the output back to the model with an evaluation prompt:

Review the following output and provide specific, actionable feedback. Focus on:
- [Criterion 1: e.g., accuracy of claims]
- [Criterion 2: e.g., clarity of explanation]
- [Criterion 3: e.g., completeness of coverage]

For each issue, explain why it is a problem and suggest a specific improvement. Do not rewrite the output — only provide feedback.

Output to review:
[output from Phase 1]

Phase 3: Refine
Pass both the original output and the critique to the model:

Using the feedback below, rewrite the output to address each identified issue. Preserve what is working well.

Original output: [Phase 1 output]

Feedback: [Phase 2 critique]

Revised output:

Performance Results

Self-Refine demonstrated approximately 20% absolute improvement across multiple task types in the 2023 research:

Code optimization (GPT-4): +8.7 units improvement on coding benchmarks
Code readability: +13.9 units
Sentiment reversal (constrained writing): +21.6 units
Average across all tested tasks: ~20% absolute improvement over the initial generation

These gains are achieved by running 2–3 refinement cycles. The research shows diminishing returns after 3 cycles — by the third iteration, most of the available improvement has been captured. Running additional cycles adds cost without proportional quality gains, and occasionally degrades quality if the model starts over-correcting.

Making the Critique Step Work

The quality of Phase 2 (the critique) determines everything. A weak critique produces no improvement. A strong critique produces meaningful revision. Several practices improve critique quality:

Specify explicit criteria: Generic "review this for quality" produces generic feedback. Listing specific criteria (accuracy, completeness, clarity, tone) produces targeted, actionable critique.
Use Likert scales for nuance: Asking the model to rate each criterion on a 1–5 scale before providing feedback produces more calibrated evaluations than binary good/bad judgements.
Separate the critique from the revision: If you ask for feedback and a rewrite in the same prompt, the model tends to rewrite immediately without genuinely engaging with the critique. The two-step structure (critique first, revise second) produces better results.
Self-critique limitation: Models consistently rate their own outputs more favourably than independent evaluators do. A 5–7% self-enhancement bias means the model's critique will miss some real problems. For the highest-quality outputs, use a second model call for the critique step — model A generates, model B critiques, model A revises.

Chain-of-Verification

Chain-of-Verification (CoVe) is a specialised self-refinement technique for factual accuracy. After producing an initial answer, the model generates a set of verification questions about its own claims, answers those questions independently (without the original response in context to avoid anchoring), and then revises the original response based on any inconsistencies found.

Research on CoVe showed that hallucinated entities in factual responses dropped from 2.95 per response to 0.68 — a 77% reduction. Precision on list-completion tasks doubled. The technique is most valuable for factual domains where hallucination is a real risk.

Structure:

Generate initial answer to factual question
Generate verification questions: "What specific claims does this answer make that could be checked?"
Answer each verification question independently (new conversation, no original answer in context)
Compare: are there any inconsistencies between the independent answers and the original?
Revise the original answer to address any identified inconsistencies

Constitutional Prompting for Self-Correction

Anthropic's Constitutional AI approach builds self-correction into the model through a set of principles the model applies to evaluate and revise its own outputs. At the prompting level (without model-level Constitutional AI training), you can implement a similar pattern: define an explicit set of principles in your system prompt, then ask the model to evaluate each output against those principles before delivering it.

This is particularly useful for content safety, brand compliance, and maintaining consistent values across a high-volume application. The system prompt defines the constitution; each response is silently evaluated against it before being shown to the user. The cost is roughly doubled per interaction, but for applications where consistency is critical, that cost is justified.

Key takeaways

Self-Refine achieves ~20% absolute improvement across code, writing, and reasoning by running the same model as generator, critic, and refiner
Two to three refinement cycles captures most available improvement — additional cycles add cost with diminishing returns and occasionally over-correct
Specifying explicit criteria for the critique step is critical — generic feedback produces generic improvement; Likert scales improve evaluation calibration
Models have a 5–7% self-enhancement bias in critique — for highest quality, use model A to generate and model B to critique
Chain-of-Verification reduced hallucinated entities by 77% by answering verification questions independently (without the original answer in context)

← Previous

Meta-Prompting and Automatic Prompt Optimization

Structured Output Engineering