Prompt Library ⚙️ Technical Multi-Modal Prompt Design
GPT-4o ⚙️ Technical Advanced

Multi-Modal Prompt Design

Design prompts that effectively combine text, images, documents, and other modalities in a single AI request.
👁 3 views ⎘ 0 copies ♥ 0 likes

The Prompt

# Multi-Modal Prompt Design

You are an expert in multi-modal AI systems. Design prompts that combine text with other modalities — images, documents, data, audio transcripts — to produce outputs that no text-only prompt could achieve.

## Multi-Modal Task

**Task:** [TASK — e.g., analyze this product photo and write a listing, compare these two contracts and flag differences, describe what is happening in this chart]
**Modalities involved:** [MODALITIES — e.g., image + text, PDF + text, spreadsheet data + text instructions]
**Primary AI tool:** [TOOL — e.g., GPT-4o, Claude with vision, Gemini 1.5 Pro]

## Modality Handling Instructions

For each non-text modality in your prompt, define:

**[MODALITY_1 — e.g., Image]:**
- What to analyze: [SPECIFIC_ELEMENTS — e.g., focus on the color palette and typography, not the background]
- What to ignore: [IGNORED_ELEMENTS]
- Level of detail required: [DETAIL — high-level description / forensic analysis]
- If unclear or ambiguous: [FALLBACK_INSTRUCTION]

**[MODALITY_2 — e.g., Document/PDF]:**
- Relevant sections: [SECTION_NAMES or page ranges]
- What to extract vs. summarize: [INSTRUCTION]

## Prompt Construction

Write the complete multi-modal prompt that:
1. Provides text instructions before presenting the non-text content
2. Explicitly directs attention to the most important elements
3. Specifies the exact output format
4. Includes a quality check instruction at the end

## Common Failure Modes

Identify the [FAILURE_COUNT] most common ways multi-modal prompts fail for this task type, and how your prompt design avoids each.

📝 Fill in the blanks

Replace these placeholders with your own content:

[TASK — e.g., analyze this product photo and write a listing, compare these two contracts and flag differences, describe what is happening in this chart]
[MODALITIES — e.g., image + text, PDF + text, spreadsheet data + text instructions]
[TOOL — e.g., GPT-4o, Claude with vision, Gemini 1.5 Pro]
[MODALITY_1 — e.g., Image]
[SPECIFIC_ELEMENTS — e.g., focus on the color palette and typography, not the background]
[IGNORED_ELEMENTS]
[DETAIL — high-level description / forensic analysis]
[FALLBACK_INSTRUCTION]
[MODALITY_2 — e.g., Document/PDF]
[SECTION_NAMES or page ranges]
[INSTRUCTION]
[FAILURE_COUNT]

How to use this prompt

1
Copy the prompt

Click "Copy Prompt" above to copy the full prompt text to your clipboard.

2
Replace the placeholders

Swap out anything in [BRACKETS] with your specific details.

3
Paste into GPT-4o

Open your preferred AI assistant and paste the prompt to get started.