Skip to content

Commit 5b0086f

Browse files
chore: Reshuffle tool setup (#40)
* chroe: move targets around and make transformResponse required * Remove 'parsed' from tools names + add reddit tool * add tools
1 parent 010b303 commit 5b0086f

111 files changed

Lines changed: 11342 additions & 437 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.agents/AGENTS.md

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,33 @@
1-
- When adding a new tool, make sure that it has tests and that its usage is documented in the readme.
1+
When a new target is added to Decodo utils, transform that target into a tool usable within the MCP
2+
server.
3+
4+
# Generation
5+
6+
- Make sure you actually see the target configuration from which we'll build the tool. Do not guess
7+
the tool setup.
8+
- You can find the target configuration in smartproxy-dashboard/repos/utils/scraping.
9+
- Make a new folder in `src/tools`.
10+
- Name the tool the same as the target name.
11+
- Add the target to an existing toolset. If none of the existing toolsets fit the target, either
12+
raise an issue or add the tool to the `web` toolset.
13+
14+
# Parameters
15+
16+
- Only add the top 7 parameters for each target. These will likely be `url`, `query`, `geo`, `local`
17+
and `jsRender`.
18+
- For `url` and `query`, make sure to add an example of a correct input inside the description.
19+
- Make sure to map `jsRender` to `headless: "html"`.
20+
- Only set `parse: true` if the target actually supports parsing.
21+
- Never add the `output` parameter.
22+
- If a target has a `markdown` parameter, always set it to `true`.
23+
- If both `parse` and `markdown` are available as parameters, prefer to use `markdown: true`.
24+
25+
# Testing
26+
27+
- Add tests that check successful and unsuccessful tool calls.
28+
- After generating the tool, call the tool to actually test it.
29+
- When testing by calling the tool, prefer to not set the `jsRender` parameter.
30+
31+
# Documentation
32+
33+
- Update readme with new tool, toolset and parameter information.

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
node_modules
22
build
33

4-
.env
4+
.env
5+
.vscode

README.md

Lines changed: 53 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@
44
[![Install MCP Server](https://cursor.com/deeplink/mcp-install-dark.svg)](https://cursor.com/en-US/install-mcp?name=Decodo&config=eyJ1cmwiOiJodHRwczovL21jcC5kZWNvZG8uY29tL21jcCIsImhlYWRlcnMiOnsiQXV0aG9yaXphdGlvbiI6IkJhc2ljIDx3ZWJfYWR2YW5jZWRfdG9rZW4%2BIn19)
55
[![smithery badge](https://smithery.ai/badge/@Decodo/decodo-mcp-server)](https://smithery.ai/server/@Decodo/decodo-mcp-server)
66

7-
87
<p align="center">
98
<a href="https://dashboard.decodo.com/integrations?utm_source=github&utm_medium=social&utm_campaign=mcp_server"> <img src="https://github.com/user-attachments/assets/a1e52a9e-3da1-4081-b3c6-053aafb8f196"/></a>
109

@@ -23,9 +22,11 @@ services, streamlining access to our tools and capabilities.
2322

2423
## Connecting to [Decodo's MCP server](https://mcp.decodo.com/mcp)
2524

26-
1. Go to [decodo.com](https://decodo.com/scraping/web) and start a Web Scraping API plan (free trials available).
25+
1. Go to [decodo.com](https://decodo.com/scraping/web) and start a Web Scraping API plan (free
26+
trials available).
2727

28-
2. Once your plan has started, obtain a Web Scraping API basic authentication token from the [dashboard](https://dashboard.decodo.com/).
28+
2. Once your plan has started, obtain a Web Scraping API basic authentication token from the
29+
[dashboard](https://dashboard.decodo.com/).
2930

3031
3. Open your preferred MCP client and add the following configuration:
3132

@@ -100,41 +101,65 @@ comma-separated list via the `toolsets` query parameter:
100101

101102
When no toolsets are specified, all tools are registered.
102103

103-
| Toolset | Tools |
104-
| -------------- | -------------------------------------------------------------- |
105-
| `web` | `scrape_as_markdown`, `screenshot` |
106-
| `search` | `google_search_parsed` |
107-
| `ecommerce` | `amazon_search_parsed` |
108-
| `social_media` | `reddit_post`, `reddit_subreddit` |
109-
| `ai` | `chatgpt`, `perplexity` |
104+
| Toolset | Tools |
105+
| -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
106+
| `web` | `scrape_as_markdown`, `screenshot` |
107+
| `search` | `google_search`, `google_ads`, `google_lens`, `google_travel_hotels`, `bing_search` |
108+
| `ecommerce` | `amazon_search`, `amazon_product`, `amazon_pricing`, `amazon_sellers`, `amazon_bestsellers`, `walmart_search`, `walmart_product`, `target_search`, `target_product`, `tiktok_shop_search`, `tiktok_shop_product`, `tiktok_shop_url` |
109+
| `social_media` | `reddit_post`, `reddit_subreddit`, `reddit_user`, `tiktok_post`, `youtube_video`, `youtube_metadata`, `youtube_channel`, `youtube_subtitles`, `youtube_search` |
110+
| `ai` | `chatgpt`, `perplexity`, `google_ai_mode` |
110111

111112
## Tools
112113

113114
The server exposes the following tools:
114115

115-
| Tool | Description | Example prompt |
116-
| ---------------------- | ------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------- |
117-
| `scrape_as_markdown` | Scrapes any target URL, expects a URL to be given via prompt. Returns results in Markdown. | Scrape peacock.com from a US IP address and tell me the pricing. |
118-
| `screenshot` | Captures a screenshot of any webpage and returns it as a PNG image. | Take a screenshot of github.com from a US IP address. |
119-
| `google_search_parsed` | Scrapes Google Search for a given query, and returns parsed results. | Scrape Google Search for shoes and tell me the top position. |
120-
| `amazon_search_parsed` | Scrapes Amazon Search for a given query, and returns parsed results. | Scrape Amazon Search for toothbrushes. |
121-
| `reddit_post` | Scrapes a specific Reddit post for a given query, and returns parsed results. | Scrape the following Reddit post: https://www.reddit.com/r/horseracing/comments/1nsrn3/ |
122-
| `reddit_subreddit` | Scrapes a specific Reddit subreddit for a given query, and returns parsed results. | Scrape the top 5 posts on r/Python this week. |
123-
| `chatgpt` | Search and interact with ChatGPT for AI-powered responses and conversations. | Ask ChatGPT to explain quantum computing in simple terms. |
124-
| `perplexity` | Search and interact with Perplexity for AI-powered responses and conversations. | Ask Perplexity what the latest trends in web development are. |
116+
| Tool | Description | Example prompt |
117+
| ----------------------- | ------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------- |
118+
| `scrape_as_markdown` | Scrapes any target URL, expects a URL to be given via prompt. Returns results in Markdown. | Scrape peacock.com from a US IP address and tell me the pricing. |
119+
| `screenshot` | Captures a screenshot of any webpage and returns it as a PNG image. | Take a screenshot of github.com from a US IP address. |
120+
| `google_search` | Scrapes Google Search for a given query, and returns parsed results. | Scrape Google Search for shoes and tell me the top position. |
121+
| `google_ads` | Scrapes Google Ads search results with automatic parsing. | Scrape Google Ads for laptop and show me the top ads. |
122+
| `google_lens` | Scrapes Google Lens image search results with automatic parsing. | Search Google Lens for this image: https://example.com/image.jpg |
123+
| `google_ai_mode` | Scrapes Google AI Mode (Search with AI) results with automatic parsing. | Ask Google AI Mode: What are the top three dog breeds? |
124+
| `google_travel_hotels` | Scrapes Google Travel Hotels search results. | Search Google Travel Hotels for hotels in Paris. |
125+
| `amazon_search` | Scrapes Amazon Search for a given query, and returns parsed results. | Scrape Amazon Search for wireless keyboard. |
126+
| `amazon_product` | Scrapes Amazon Product page with automatic parsing. | Scrape Amazon product B09H74FXNW and show me the details. |
127+
| `amazon_pricing` | Scrapes Amazon Product pricing information with automatic parsing. | Get pricing for Amazon product B09H74FXNW. |
128+
| `amazon_sellers` | Scrapes Amazon Seller information with automatic parsing. | Get information about Amazon seller A1R0Z7FJGTKESH. |
129+
| `amazon_bestsellers` | Scrapes Amazon Bestsellers list with automatic parsing. | Show me Amazon bestsellers in electronics. |
130+
| `walmart_search` | Scrapes Walmart Search for a given query, and returns parsed results. | Scrape Walmart Search for camping tent. |
131+
| `walmart_product` | Scrapes Walmart Product page with automatic parsing. | Scrape Walmart product 15296401808. |
132+
| `target_search` | Scrapes Target Search for a given query, and returns parsed results. | Scrape Target Search for kitchen appliances. |
133+
| `target_product` | Scrapes Target Product page with automatic parsing. | Scrape Target product 92186007. |
134+
| `tiktok_post` | Scrapes a TikTok post URL for structured data (e.g. engagement, caption, hashtags). | Scrape this TikTok post: https://www.tiktok.com/@nba/video/7393013274725403950 |
135+
| `tiktok_shop_search` | Scrapes TikTok Shop Search for a given query, and returns parsed results. | Scrape TikTok Shop Search for phone cases. |
136+
| `tiktok_shop_product` | Scrapes TikTok Shop Product page. | Scrape TikTok Shop product 1731541214379741272. |
137+
| `tiktok_shop_url` | Scrapes TikTok Shop page by URL. | Scrape this TikTok Shop URL: https://www.tiktok.com/shop/s?q=HEADPHONES |
138+
| `youtube_video` | Scrapes YouTube video information. | Scrape YouTube video 6Ejga4kJUts. |
139+
| `youtube_metadata` | Scrapes YouTube video metadata. | Get metadata for YouTube video dFu9aKJoqGg. |
140+
| `youtube_channel` | Scrapes YouTube channel videos with automatic parsing. | Scrape YouTube channel @decodo_official. |
141+
| `youtube_subtitles` | Scrapes YouTube video subtitles. | Get subtitles for YouTube video L8zSWbQN-v8. |
142+
| `youtube_search` | Search YouTube videos. | Search YouTube for "How to care for chinchillas". |
143+
| `reddit_post` | Scrapes a specific Reddit post for a given query, and returns parsed results. | Scrape the following Reddit post: https://www.reddit.com/r/horseracing/comments/1nsrn3/ |
144+
| `reddit_subreddit` | Scrapes a specific Reddit subreddit for a given query, and returns parsed results. | Scrape the top 5 posts on r/Python this week. |
145+
| `reddit_user` | Scrapes a Reddit user profile and their posts or comments. | Scrape Reddit user u/spez's profile. |
146+
| `bing_search` | Scrapes Bing Search results with automatic parsing. | Search Bing for laptop reviews. |
147+
| `chatgpt` | Search and interact with ChatGPT for AI-powered responses and conversations. | Ask ChatGPT to explain quantum computing in simple terms. |
148+
| `perplexity` | Search and interact with Perplexity for AI-powered responses and conversations. | Ask Perplexity what the latest trends in web development are. |
125149

126150
## Parameters
127151

128152
The following parameters are inferred from user prompts:
129153

130-
| Parameter | Description |
131-
| -------------- | ---------------------------------------------------------------------------------------------------- |
132-
| `jsRender` | Renders target URL in a headless browser. |
133-
| `geo` | Sets the country from which the request will originate. |
134-
| `locale` | Sets the locale of the request. |
135-
| `tokenLimit` | Truncates the response content up to this limit. Useful if the context window is small. |
136-
| `prompt` | Prompt to send to AI tools (`chatgpt`, `perplexity`). |
137-
| `search` | Activates ChatGPT's web search functionality (`chatgpt` only). |
154+
| Parameter | Description |
155+
| ------------ | ----------------------------------------------------------------------------------------------------- |
156+
| `jsRender` | Renders target URL in a headless browser. |
157+
| `geo` | Sets the country from which the request will originate. |
158+
| `locale` | Sets the locale of the request. |
159+
| `tokenLimit` | Truncates the response content up to this limit. Useful if the context window is small. |
160+
| `prompt` | Prompt to send to AI tools (`chatgpt`, `perplexity`). |
161+
| `search` | Activates ChatGPT's web search functionality (`chatgpt` only). |
162+
| `xhr` | When true, includes XHR or fetch responses in the scrape result where supported (e.g. `tiktok_post`). |
138163

139164
## Examples
140165

eslint.config.mjs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ export default tseslint.config(
88
files: ['src/**/*.ts'],
99
rules: {
1010
'curly': ['error', 'all'],
11+
'lines-between-class-members': ['error', 'always', { exceptAfterSingleLine: false }],
1112
'prefer-arrow-callback': 'error',
1213
'no-restricted-syntax': [
1314
'error',

jest.config.js

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,10 @@ module.exports = {
88
transform: {
99
...tsJestTransformCfg,
1010
},
11+
moduleNameMapper: {
12+
"^types$": "<rootDir>/src/types",
13+
"^utils$": "<rootDir>/src/utils",
14+
"^server/(.*)$": "<rootDir>/src/server/$1",
15+
"^clients/(.*)$": "<rootDir>/src/clients/$1",
16+
},
1117
};

src/__tests__/scraper-api-client.test.ts renamed to src/clients/__tests__/scraper-api-client.test.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
import axios, { AxiosError, AxiosHeaders } from 'axios';
2-
import { ScraperApiClient } from '../clients/scraper-api-client';
2+
import { ScraperApiClient } from '../scraper-api-client';
33

44
const { AxiosError: RealAxiosError } = jest.requireActual<typeof import('axios')>('axios');
55

src/constants.ts

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,39 @@ export enum TOOLSET {
99
// todo: utils
1010
export enum SCRAPER_API_TARGETS {
1111
GOOGLE_SEARCH = 'google_search',
12+
GOOGLE_TRAVEL_HOTELS = 'google_travel_hotels',
13+
GOOGLE_ADS = 'google_ads',
14+
GOOGLE_LENS = 'google_lens',
15+
GOOGLE_AI_MODE = 'google_ai_mode',
16+
1217
AMAZON_SEARCH = 'amazon_search',
18+
AMAZON_PRODUCT = 'amazon_product',
19+
AMAZON_PRICING = 'amazon_pricing',
20+
AMAZON_SELLERS = 'amazon_sellers',
21+
AMAZON_BESTSELLERS = 'amazon_bestsellers',
22+
23+
WALMART_SEARCH = 'walmart_search',
24+
WALMART_PRODUCT = 'walmart_product',
25+
26+
TARGET_SEARCH = 'target_search',
27+
TARGET_PRODUCT = 'target_product',
28+
29+
TIKTOK_POST = 'tiktok_post',
30+
TIKTOK_SHOP_SEARCH = 'tiktok_shop_search',
31+
TIKTOK_SHOP_PRODUCT = 'tiktok_shop_product',
32+
TIKTOK_SHOP_URL = 'tiktok',
33+
34+
YOUTUBE_VIDEO = 'youtube_video',
35+
YOUTUBE_METADATA = 'youtube_metadata',
36+
YOUTUBE_CHANNEL = 'youtube_channel',
37+
YOUTUBE_SUBTITLES = 'youtube_subtitles',
38+
YOUTUBE_SEARCH = 'youtube_search',
1339

1440
REDDIT_POST = 'reddit_post',
1541
REDDIT_SUBREDDIT = 'reddit_subreddit',
42+
REDDIT_USER = 'reddit_user',
43+
44+
BING_SEARCH = 'bing_search',
1645

1746
CHATGPT = 'chatgpt',
1847
PERPLEXITY = 'perplexity',

src/index.ts

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,11 +38,12 @@ const main = async () => {
3838
// if there are no envs, some MCP clients will fail silently
3939
const { sapiUsername, sapiPassword } = parseEnvsOrExit();
4040

41+
const auth = Buffer.from(`${sapiUsername}:${sapiPassword}`).toString('base64');
42+
4143
const toolsets = resolveToolsets(process.env.TOOLSETS);
4244

4345
const sapiMcpServer = new ScraperAPIStdioServer({
44-
sapiUsername,
45-
sapiPassword,
46+
auth,
4647
toolsets,
4748
});
4849
await sapiMcpServer.connect(transport);

src/server.ts

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ app.post('/mcp', async (req, res) => {
3939

4040
const toolsets = resolveToolsets(req.query.toolsets as string);
4141

42-
const server = new ScraperAPIHttpServer({ toolsets });
42+
const server = new ScraperAPIHttpServer({ toolsets, auth: token });
4343

4444
const transport = new StreamableHTTPServerTransport({
4545
sessionIdGenerator: undefined,
@@ -50,8 +50,6 @@ app.post('/mcp', async (req, res) => {
5050
transport.close();
5151
});
5252

53-
server.setAuthToken(token);
54-
5553
await server.connect(transport);
5654

5755
await transport.handleRequest(req, res, req.body);

0 commit comments

Comments
 (0)