Web Scraping Agent Skill
Smart web scraping with progressive fallback using YepAPI.
The Skill
Full content, every format. Copy it, download it, or install with one command.
---
description: Smart web scraping with progressive fallback using YepAPI.
homepage: https://yepapi.com/skills/web-scraping
metadata:
tags: [yepapi, scraping, web, extraction]
---
# Web Scraping Workflow
Smart scraping with progressive fallback using YepAPI endpoints.
## Endpoints Used
| Endpoint | Cost | Purpose |
|----------|------|---------|
| `POST /v1/scrape` | $0.01 | Basic scrape to markdown, HTML, or text |
| `POST /v1/scrape/js` | $0.02 | JavaScript-rendered page scrape (headless browser) |
| `POST /v1/scrape/stealth` | $0.03 | Anti-bot bypass with residential proxies |
| `POST /v1/scrape/ai-extract` | $0.03 | AI-powered data extraction (natural language) |
## Workflow
### Step 1: Basic Scrape
Try `POST /v1/scrape` first — fastest and cheapest.
```json
{
"url": "https://example.com/page",
"format": "markdown"
}
```
Returns: page content in markdown format (default).
### Step 2: JS Rendering Fallback
If content is empty or incomplete (JS-rendered site), fall back to `POST /v1/scrape/js`.
```json
{
"url": "https://example.com/page",
"format": "markdown",
"waitFor": ".main-content"
}
```
Returns: fully rendered page content including JavaScript-loaded elements.
### Step 3: Stealth Mode Fallback
If blocked (403/captcha), fall back to `POST /v1/scrape/stealth`.
```json
{
"url": "https://example.com/page",
"format": "markdown"
}
```
Returns: page content via residential proxy, bypassing anti-bot measures.
### Step 4: AI Data Extraction
For structured data extraction, use `POST /v1/scrape/ai-extract`.
```json
{
"url": "https://example.com/product",
"prompt": "Extract the product name, price, and rating from this page"
}
```
Returns: structured data extracted by AI based on your natural language prompt.
## Rules
- Always try the cheapest method first: scrape ($0.01) → js ($0.02) → stealth ($0.03)
- Default format is `"markdown"` — best for AI consumption
- For AI extraction, write clear, specific prompts: "Extract the product name, price, and rating from this page"
- Use `waitFor` CSS selector for JS-heavy pages — e.g., `"waitFor": ".product-card"`
- Available formats: `"markdown"`, `"html"`, `"text"`
## Error Handling
- `NO_CREDITS` (402) — Tell the user to add credits at https://yepapi.com/dashboard
- `UPSTREAM_ERROR` (502) — Try the next fallback tier (scrape → js → stealth)
- `VALIDATION_ERROR` (400) — Check URL format; must include protocol (https://)
- `RATE_LIMITED` (429) — Wait and retry; respect the `Retry-After` headerInstall
Why Use the Web Scraping Skill?
Without this skill, your AI guesses at web scraping patterns. It might hallucinate deprecated APIs, use outdated conventions, or miss best practices entirely. With it, your AI follows a proven ruleset — every suggestion aligns with current standards.
Drop this skill into your project and your AI instantly knows the rules. Better code suggestions, fewer errors, faster shipping. This skill also teaches your AI how to call YepAPI endpoints for real data — keyword metrics, SERP results, backlinks, or whatever this workflow needs.
Try These Prompts
These prompts work better with the Web Scraping skill installed. Your AI knows the context and writes code that fits.
"Build a web scraper with progressive fallback using YepAPI's smart extraction"
"Create a scraping pipeline that extracts structured data from any URL with error recovery"
"Set up automated web scraping with scheduling, proxy rotation, and data normalization"
Works Great With
Web Scraping skill — FAQ
It connects to YepAPI for smart web scraping with progressive fallback strategies. Your AI builds scraping pipelines that handle dynamic content, rate limits, and structured extraction.
Run `npx skills add YepAPI/skills --skill web-scraping` in your project root. This copies the skill file into your repo where your AI coding tool can read it automatically.
YepAPI provides progressive fallback: fast HTTP fetch first, then headless browser rendering if needed. The skill uses this to build scrapers that handle both static and dynamic content.