GPT-4o ⚙️ Technical Advanced

Multi-Modal Prompt Design

Design prompts that effectively combine text, images, documents, and other modalities in a single AI request.

👁 12 views ⎘ 0 copies ♥ 0 likes

The Prompt

# Multi-Modal Prompt Design

You are an expert in multi-modal AI systems. Design prompts that combine text with other modalities — images, documents, data, audio transcripts — to produce outputs that no text-only prompt could achieve.

## Multi-Modal Task

**Task:** [TASK — e.g., analyze this product photo and write a listing, compare these two contracts and flag differences, describe what is happening in this chart]
**Modalities involved:** [MODALITIES — e.g., image + text, PDF + text, spreadsheet data + text instructions]
**Primary AI tool:** [TOOL — e.g., GPT-4o, Claude with vision, Gemini 1.5 Pro]

## Modality Handling Instructions

For each non-text modality in your prompt, define:

**[MODALITY_1 — e.g., Image]:**
- What to analyze: [SPECIFIC_ELEMENTS — e.g., focus on the color palette and typography, not the background]
- What to ignore: [IGNORED_ELEMENTS]
- Level of detail required: [DETAIL — high-level description / forensic analysis]
- If unclear or ambiguous: [FALLBACK_INSTRUCTION]

**[MODALITY_2 — e.g., Document/PDF]:**
- Relevant sections: [SECTION_NAMES or page ranges]
- What to extract vs. summarize: [INSTRUCTION]

## Prompt Construction

Write the complete multi-modal prompt that:
1. Provides text instructions before presenting the non-text content
2. Explicitly directs attention to the most important elements
3. Specifies the exact output format
4. Includes a quality check instruction at the end

## Common Failure Modes

Identify the [FAILURE_COUNT] most common ways multi-modal prompts fail for this task type, and how your prompt design avoids each.

📝 Fill in the blanks

Replace these placeholders with your own content:

[TASK — e.g., analyze this product photo and write a listing, compare these two contracts and flag differences, describe what is happening in this chart]

[MODALITIES — e.g., image + text, PDF + text, spreadsheet data + text instructions]

[TOOL — e.g., GPT-4o, Claude with vision, Gemini 1.5 Pro]

[MODALITY_1 — e.g., Image]

[SPECIFIC_ELEMENTS — e.g., focus on the color palette and typography, not the background]

[IGNORED_ELEMENTS]

[DETAIL — high-level description / forensic analysis]

[FALLBACK_INSTRUCTION]

[MODALITY_2 — e.g., Document/PDF]

[SECTION_NAMES or page ranges]

[INSTRUCTION]

[FAILURE_COUNT]

How to use this prompt

Copy the prompt

Click "Copy Prompt" above to copy the full prompt text to your clipboard.

Replace the placeholders

Swap out anything in [BRACKETS] with your specific details.

Paste into GPT-4o

Open your preferred AI assistant and paste the prompt to get started.

Model GPT-4o

Category ⚙️ Technical

Difficulty Advanced

Copies 0

Added May 27, 2026

Multi-Modal Prompt Design

The Prompt

📝 Fill in the blanks

How to use this prompt

You might also like

Code Reviewer & Refactor

Weekly Performance & Goal-Setting Plan for Tech Support Agents

Technical Debt & Legacy Code Refactor

Complex Regex & Pattern Matching Builder