Skip to content

Commit 2a1e9e2

Browse files
committed
docs(agent): update skill with the latest changes
1 parent 77b29a3 commit 2a1e9e2

2 files changed

Lines changed: 34 additions & 13 deletions

File tree

agent-skill/Scrapling-Skill.zip

604 Bytes
Binary file not shown.

agent-skill/Scrapling-Skill/references/mcp-server.md

Lines changed: 34 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Scrapling MCP Server
22

3-
The Scrapling MCP server exposes nine web scraping tools over the MCP protocol. It supports CSS-selector-based content narrowing (reducing tokens by extracting only relevant elements before returning results), three levels of scraping capability (plain HTTP, browser-rendered, and stealth/anti-bot bypass), and persistent browser session management.
3+
The Scrapling MCP server exposes ten tools over the MCP protocol. It supports CSS-selector-based content narrowing (reducing tokens by extracting only relevant elements before returning results), three levels of scraping capability (plain HTTP, browser-rendered, and stealth/anti-bot bypass), persistent browser session management, and page screenshots returned as real image content blocks.
44

5-
All scraping tools return a `ResponseModel` with fields: `status` (int), `content` (list of strings), `url` (str).
5+
All scraping tools return a `ResponseModel` with fields: `status` (int), `content` (list of strings), `url` (str). The `screenshot` tool returns a list of MCP content blocks: an `ImageContent` (the screenshot bytes) followed by a `TextContent` (the post-redirect URL).
66

77
## Tools
88

@@ -99,17 +99,18 @@ Opens a browser session that stays alive across multiple fetch calls, avoiding t
9999

100100
**Key parameters:**
101101

102-
| Parameter | Type | Default | Description |
103-
|--------------------|-----------------------------|--------------|---------------------------------------------------------------------|
104-
| `session_type` | `"dynamic"` / `"stealthy"` | required | Type of browser session to create |
105-
| `headless` | bool | true | Run browser hidden or visible |
106-
| `max_pages` | int | 5 | Max concurrent browser tabs (1-50) |
107-
| `proxy` | str or dict or null | null | Proxy for all requests in this session |
108-
| `timeout` | number | 30000 | Default timeout in ms |
109-
| `solve_cloudflare` | bool | false | (Stealthy only) Auto-solve Cloudflare challenges |
110-
| `hide_canvas` | bool | false | (Stealthy only) Canvas fingerprint noise |
111-
| `block_webrtc` | bool | false | (Stealthy only) Block WebRTC IP leak |
112-
| `allow_webgl` | bool | true | (Stealthy only) Keep WebGL enabled |
102+
| Parameter | Type | Default | Description |
103+
|--------------------|-----------------------------|--------------|-------------------------------------------------------------------------------------------------------|
104+
| `session_type` | `"dynamic"` / `"stealthy"` | required | Type of browser session to create |
105+
| `session_id` | str or null | null | Custom ID for the session. If omitted, a random 12-char hex ID is generated. Raises if already in use |
106+
| `headless` | bool | true | Run browser hidden or visible |
107+
| `max_pages` | int | 5 | Max concurrent browser tabs (1-50) |
108+
| `proxy` | str or dict or null | null | Proxy for all requests in this session |
109+
| `timeout` | number | 30000 | Default timeout in ms |
110+
| `solve_cloudflare` | bool | false | (Stealthy only) Auto-solve Cloudflare challenges |
111+
| `hide_canvas` | bool | false | (Stealthy only) Canvas fingerprint noise |
112+
| `block_webrtc` | bool | false | (Stealthy only) Block WebRTC IP leak |
113+
| `allow_webgl` | bool | true | (Stealthy only) Keep WebGL enabled |
113114

114115
Plus all other browser session parameters (`google_search`, `real_chrome`, `cdp_url`, `locale`, `timezone_id`, `useragent`, `extra_headers`, `cookies`, `disable_resources`, `network_idle`, `wait_selector`, `wait_selector_state`).
115116

@@ -131,6 +132,25 @@ Returns a list of `SessionInfo` objects, each with `session_id`, `session_type`,
131132

132133
No parameters.
133134

135+
### `screenshot` -- Capture a page screenshot
136+
137+
Navigates to a URL inside an existing browser session and returns the screenshot as an MCP `ImageContent` block (the bytes the model can see directly, not a base64 string in JSON) followed by a `TextContent` block carrying the post-redirect URL.
138+
139+
Requires an open browser session. Call `open_session` first, then pass the `session_id` here. Both `dynamic` and `stealthy` sessions are accepted.
140+
141+
| Parameter | Type | Default | Description |
142+
|-----------------------|-----------------------|--------------|--------------------------------------------------------------------------------------|
143+
| `url` | str | required | URL to navigate to and capture |
144+
| `session_id` | str | required | ID of an open browser session created with `open_session` |
145+
| `image_type` | `"png"` / `"jpeg"` | `"png"` | Image format. Use `"jpeg"` for smaller payloads |
146+
| `full_page` | bool | false | Capture the full scrollable page instead of just the viewport |
147+
| `quality` | int or null | null | JPEG quality 0-100. Raises if passed with `image_type="png"` |
148+
| `wait` | number | 0 | Extra wait (ms) after page load before capture |
149+
| `wait_selector` | str or null | null | CSS selector to wait for before capture |
150+
| `wait_selector_state` | str | `"attached"` | State for `wait_selector`: `"attached"` / `"visible"` / `"hidden"` / `"detached"` |
151+
| `network_idle` | bool | false | Wait until no network activity for 500ms |
152+
| `timeout` | number | 30000 | Timeout in milliseconds |
153+
134154
## Tool selection guide
135155

136156
| Scenario | Tool |
@@ -142,6 +162,7 @@ No parameters.
142162
| Cloudflare or strong anti-bot protection | `stealthy_fetch` (with `solve_cloudflare=true` for Turnstile) |
143163
| Multiple protected pages | `bulk_stealthy_fetch` |
144164
| Multiple pages from the same site | `open_session` + `fetch`/`stealthy_fetch` with `session_id` |
165+
| Need a screenshot of a page | `open_session` + `screenshot` with `session_id` |
145166

146167
Start with `get` (fastest, lowest resource cost). Escalate to `fetch` if content requires JS rendering. Escalate to `stealthy_fetch` only if blocked. For multiple pages from the same site, use a persistent session to avoid browser launch overhead.
147168

0 commit comments

Comments
 (0)