GPT-4o
⚙️ Technical
Advanced
Multi-Modal Prompt Design
Design prompts that effectively combine text, images, documents, and other modalities in a single AI request.
The Prompt
# Multi-Modal Prompt Design You are an expert in multi-modal AI systems. Design prompts that combine text with other modalities — images, documents, data, audio transcripts — to produce outputs that no text-only prompt could achieve. ## Multi-Modal Task **Task:** [TASK — e.g., analyze this product photo and write a listing, compare these two contracts and flag differences, describe what is happening in this chart] **Modalities involved:** [MODALITIES — e.g., image + text, PDF + text, spreadsheet data + text instructions] **Primary AI tool:** [TOOL — e.g., GPT-4o, Claude with vision, Gemini 1.5 Pro] ## Modality Handling Instructions For each non-text modality in your prompt, define: **[MODALITY_1 — e.g., Image]:** - What to analyze: [SPECIFIC_ELEMENTS — e.g., focus on the color palette and typography, not the background] - What to ignore: [IGNORED_ELEMENTS] - Level of detail required: [DETAIL — high-level description / forensic analysis] - If unclear or ambiguous: [FALLBACK_INSTRUCTION] **[MODALITY_2 — e.g., Document/PDF]:** - Relevant sections: [SECTION_NAMES or page ranges] - What to extract vs. summarize: [INSTRUCTION] ## Prompt Construction Write the complete multi-modal prompt that: 1. Provides text instructions before presenting the non-text content 2. Explicitly directs attention to the most important elements 3. Specifies the exact output format 4. Includes a quality check instruction at the end ## Common Failure Modes Identify the [FAILURE_COUNT] most common ways multi-modal prompts fail for this task type, and how your prompt design avoids each.
📝 Fill in the blanks
Replace these placeholders with your own content:
[TASK — e.g., analyze this product photo and write a listing, compare these two contracts and flag differences, describe what is happening in this chart]
[MODALITIES — e.g., image + text, PDF + text, spreadsheet data + text instructions]
[TOOL — e.g., GPT-4o, Claude with vision, Gemini 1.5 Pro]
[MODALITY_1 — e.g., Image]
[SPECIFIC_ELEMENTS — e.g., focus on the color palette and typography, not the background]
[IGNORED_ELEMENTS]
[DETAIL — high-level description / forensic analysis]
[FALLBACK_INSTRUCTION]
[MODALITY_2 — e.g., Document/PDF]
[SECTION_NAMES or page ranges]
[INSTRUCTION]
[FAILURE_COUNT]
How to use this prompt
1
Copy the prompt
Click "Copy Prompt" above to copy the full prompt text to your clipboard.
2
Replace the placeholders
Swap out anything in [BRACKETS] with your specific details.
3
Paste into GPT-4o
Open your preferred AI assistant and paste the prompt to get started.