Prompt Library ⚙️ Technical Build a Web Scraper
Any ⚙️ Technical Intermediate

Build a Web Scraper

Design and write a complete web scraping solution for any data extraction task.
👁 7 views ⎘ 0 copies ♥ 0 likes

The Prompt

Help me build a web scraper for the following task:

Target website: [URL or describe the type of site]
Data to extract: [exactly what fields/information you need]
Scale: [one-time scrape / daily / real-time / millions of pages]
Programming language: [Python / Node.js / other]
Output format: [CSV / JSON / database / other]
Authentication required: [yes — describe login process / no]
Anti-scraping measures present: [CAPTCHAs / rate limiting / JavaScript rendering / login walls / other]
Legal/ethical context: [public data / own site testing / research / other]

Provide a complete scraping solution:

TECH STACK RECOMMENDATION:
- Library/framework choice (BeautifulSoup / Scrapy / Playwright / Puppeteer / etc.)
- Justification based on the site's characteristics
- Additional tools needed

COMPLETE CODE:
- Full working scraper code
- Data extraction logic with selectors
- Error handling for missing fields
- Rate limiting and polite crawling delays
- Retry logic for failed requests
- Output to specified format

HANDLING ANTI-SCRAPING:
- User agent rotation
- Request header management
- IP rotation approach (if needed)
- CAPTCHA handling strategy
- JavaScript rendering solution (if needed)

DATA CLEANING:
- Normalising extracted data
- Handling missing or malformed values
- Deduplication logic

SCALING:
- How to handle large volumes
- Parallelisation approach
- Storage strategy at scale

MONITORING:
- How to detect when the site structure changes
- Alerting on extraction failures
- Logging approach

ETHICAL AND LEGAL NOTES:
- robots.txt compliance check
- Rate limiting best practices
- Data storage and usage considerations

📝 Fill in the blanks

Replace these placeholders with your own content:

[URL or describe the type of site]
[exactly what fields/information you need]
[one-time scrape / daily / real-time / millions of pages]
[Python / Node.js / other]
[CSV / JSON / database / other]
[yes — describe login process / no]
[CAPTCHAs / rate limiting / JavaScript rendering / login walls / other]
[public data / own site testing / research / other]

How to use this prompt

1
Copy the prompt

Click "Copy Prompt" above to copy the full prompt text to your clipboard.

2
Replace the placeholders

Swap out anything in [BRACKETS] with your specific details.

3
Paste into Any

Open your preferred AI assistant and paste the prompt to get started.