Prompt Library ⚙️ Technical Write a Data Pipeline
Any ⚙️ Technical Advanced

Write a Data Pipeline

Design and build a complete data pipeline for any ETL or data processing task.
👁 6 views ⎘ 0 copies ♥ 0 likes

The Prompt

Design and build a data pipeline for the following:

Data source(s): [describe where data comes from — APIs / databases / files / streams / other]
Data destination(s): [where processed data needs to go — data warehouse / database / dashboard / API / file]
Data volume: [records per day / GB per day]
Processing frequency: [real-time streaming / micro-batch every X minutes / daily batch]
Transformations needed: [describe what needs to happen to the data — cleaning / aggregation / enrichment / joining]
Language/tools preference: [Python / SQL / Apache Spark / dbt / Airflow / other]
Cloud environment: [AWS / GCP / Azure / on-premise / other]
Data quality requirements: [strict — no bad data / tolerant — flag and continue]

Provide a complete pipeline design and implementation:

ARCHITECTURE OVERVIEW:
- Pipeline stages (Extract → Transform → Load breakdown)
- Tool choices for each stage with justification
- Batch vs streaming decision and rationale

EXTRACTION LAYER:
- Source connection code
- Incremental extraction strategy (full load vs delta)
- Rate limiting and pagination handling
- Source schema documentation

TRANSFORMATION LAYER:
- Complete transformation code
- Data cleaning rules (nulls, duplicates, type casting)
- Business logic transformations
- Data validation checks
- Enrichment joins

LOADING LAYER:
- Destination connection and write code
- Upsert vs append vs overwrite strategy
- Partitioning strategy
- Index management

ORCHESTRATION:
- DAG or schedule definition
- Dependency management between tasks
- Retry logic
- SLA monitoring

DATA QUALITY:
- Validation checks at each stage
- Data quality metrics to track
- Alerting on quality failures
- Quarantine strategy for bad records

ERROR HANDLING AND RECOVERY:
- Failure detection
- Partial failure recovery
- Dead letter queue for failed records
- Reprocessing strategy

MONITORING:
- Pipeline health metrics
- Data freshness monitoring
- Volume anomaly detection
- Logging structure

📝 Fill in the blanks

Replace these placeholders with your own content:

[describe where data comes from — APIs / databases / files / streams / other]
[where processed data needs to go — data warehouse / database / dashboard / API / file]
[records per day / GB per day]
[real-time streaming / micro-batch every X minutes / daily batch]
[describe what needs to happen to the data — cleaning / aggregation / enrichment / joining]
[Python / SQL / Apache Spark / dbt / Airflow / other]
[AWS / GCP / Azure / on-premise / other]
[strict — no bad data / tolerant — flag and continue]

How to use this prompt

1
Copy the prompt

Click "Copy Prompt" above to copy the full prompt text to your clipboard.

2
Replace the placeholders

Swap out anything in [BRACKETS] with your specific details.

3
Paste into Any

Open your preferred AI assistant and paste the prompt to get started.