Any
⚙️ Technical
Advanced
Write a Data Pipeline
Design and build a complete data pipeline for any ETL or data processing task.
The Prompt
Design and build a data pipeline for the following: Data source(s): [describe where data comes from — APIs / databases / files / streams / other] Data destination(s): [where processed data needs to go — data warehouse / database / dashboard / API / file] Data volume: [records per day / GB per day] Processing frequency: [real-time streaming / micro-batch every X minutes / daily batch] Transformations needed: [describe what needs to happen to the data — cleaning / aggregation / enrichment / joining] Language/tools preference: [Python / SQL / Apache Spark / dbt / Airflow / other] Cloud environment: [AWS / GCP / Azure / on-premise / other] Data quality requirements: [strict — no bad data / tolerant — flag and continue] Provide a complete pipeline design and implementation: ARCHITECTURE OVERVIEW: - Pipeline stages (Extract → Transform → Load breakdown) - Tool choices for each stage with justification - Batch vs streaming decision and rationale EXTRACTION LAYER: - Source connection code - Incremental extraction strategy (full load vs delta) - Rate limiting and pagination handling - Source schema documentation TRANSFORMATION LAYER: - Complete transformation code - Data cleaning rules (nulls, duplicates, type casting) - Business logic transformations - Data validation checks - Enrichment joins LOADING LAYER: - Destination connection and write code - Upsert vs append vs overwrite strategy - Partitioning strategy - Index management ORCHESTRATION: - DAG or schedule definition - Dependency management between tasks - Retry logic - SLA monitoring DATA QUALITY: - Validation checks at each stage - Data quality metrics to track - Alerting on quality failures - Quarantine strategy for bad records ERROR HANDLING AND RECOVERY: - Failure detection - Partial failure recovery - Dead letter queue for failed records - Reprocessing strategy MONITORING: - Pipeline health metrics - Data freshness monitoring - Volume anomaly detection - Logging structure
📝 Fill in the blanks
Replace these placeholders with your own content:
[describe where data comes from — APIs / databases / files / streams / other]
[where processed data needs to go — data warehouse / database / dashboard / API / file]
[records per day / GB per day]
[real-time streaming / micro-batch every X minutes / daily batch]
[describe what needs to happen to the data — cleaning / aggregation / enrichment / joining]
[Python / SQL / Apache Spark / dbt / Airflow / other]
[AWS / GCP / Azure / on-premise / other]
[strict — no bad data / tolerant — flag and continue]
How to use this prompt
1
Copy the prompt
Click "Copy Prompt" above to copy the full prompt text to your clipboard.
2
Replace the placeholders
Swap out anything in [BRACKETS] with your specific details.
3
Paste into Any
Open your preferred AI assistant and paste the prompt to get started.