Skip to content

Commit 9af644e

Browse files
committed
docs: improving the code copy-paste experience and use less tokens for the agent skill
1 parent b626b4d commit 9af644e

17 files changed

Lines changed: 346 additions & 364 deletions

File tree

agent-skill/Scrapling-Skill/references/fetching/choosing.md

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -27,23 +27,23 @@ The following table compares them and can be quickly used for guidance.
2727
## Parser configuration in all fetchers
2828
All fetchers share the same import method, as you will see in the upcoming pages
2929
```python
30-
>>> from scrapling.fetchers import Fetcher, AsyncFetcher, StealthyFetcher, DynamicFetcher
30+
from scrapling.fetchers import Fetcher, AsyncFetcher, StealthyFetcher, DynamicFetcher
3131
```
3232
Then you use it right away without initializing like this, and it will use the default parser settings:
3333
```python
34-
>>> page = StealthyFetcher.fetch('https://example.com')
34+
page = StealthyFetcher.fetch('https://example.com')
3535
```
3636
If you want to configure the parser ([Selector class](parsing/main_classes.md#selector)) that will be used on the response before returning it for you, then do this first:
3737
```python
38-
>>> from scrapling.fetchers import Fetcher
39-
>>> Fetcher.configure(adaptive=True, keep_comments=False, keep_cdata=False) # and the rest
38+
from scrapling.fetchers import Fetcher
39+
Fetcher.configure(adaptive=True, keep_comments=False, keep_cdata=False) # and the rest
4040
```
4141
or
4242
```python
43-
>>> from scrapling.fetchers import Fetcher
44-
>>> Fetcher.adaptive=True
45-
>>> Fetcher.keep_comments=False
46-
>>> Fetcher.keep_cdata=False # and the rest
43+
from scrapling.fetchers import Fetcher
44+
Fetcher.adaptive=True
45+
Fetcher.keep_comments=False
46+
Fetcher.keep_cdata=False # and the rest
4747
```
4848
Then, continue your code as usual.
4949

@@ -59,19 +59,19 @@ If your use case requires a different configuration for each request/fetch, you
5959
## Response Object
6060
The `Response` object is the same as the [Selector](parsing/main_classes.md#selector) class, but it has additional details about the response, like response headers, status, cookies, etc., as shown below:
6161
```python
62-
>>> from scrapling.fetchers import Fetcher
63-
>>> page = Fetcher.get('https://example.com')
62+
from scrapling.fetchers import Fetcher
63+
page = Fetcher.get('https://example.com')
6464

65-
>>> page.status # HTTP status code
66-
>>> page.reason # Status message
67-
>>> page.cookies # Response cookies as a dictionary
68-
>>> page.headers # Response headers
69-
>>> page.request_headers # Request headers
70-
>>> page.history # Response history of redirections, if any
71-
>>> page.body # Raw response body as bytes
72-
>>> page.encoding # Response encoding
73-
>>> page.meta # Response metadata dictionary (e.g., proxy used). Mainly helpful with the spiders system.
74-
>>> page.captured_xhr # List of captured XHR/fetch responses (when capture_xhr is enabled on a browser session)
65+
page.status # HTTP status code
66+
page.reason # Status message
67+
page.cookies # Response cookies as a dictionary
68+
page.headers # Response headers
69+
page.request_headers # Request headers
70+
page.history # Response history of redirections, if any
71+
page.body # Raw response body as bytes
72+
page.encoding # Response encoding
73+
page.meta # Response metadata dictionary (e.g., proxy used). Mainly helpful with the spiders system.
74+
page.captured_xhr # List of captured XHR/fetch responses (when capture_xhr is enabled on a browser session)
7575
```
7676
All fetchers return the `Response` object.
7777

agent-skill/Scrapling-Skill/references/fetching/dynamic.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ As we will explain later, to automate the page, you need some knowledge of [Play
88
You have one primary way to import this Fetcher, which is the same for all fetchers.
99

1010
```python
11-
>>> from scrapling.fetchers import DynamicFetcher
11+
from scrapling.fetchers import DynamicFetcher
1212
```
1313
Check out how to configure the parsing options [here](choosing.md#parser-configuration-in-all-fetchers)
1414

agent-skill/Scrapling-Skill/references/fetching/static.md

Lines changed: 73 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ The `Fetcher` class provides rapid and lightweight HTTP requests using the high-
66
Import the Fetcher (same import pattern for all fetchers):
77

88
```python
9-
>>> from scrapling.fetchers import Fetcher
9+
from scrapling.fetchers import Fetcher
1010
```
1111
Check out how to configure the parsing options [here](choosing.md#parser-configuration-in-all-fetchers)
1212

@@ -47,41 +47,41 @@ Examples are the best way to explain this:
4747
> Hence: `OPTIONS` and `HEAD` methods are not supported.
4848
#### GET
4949
```python
50-
>>> from scrapling.fetchers import Fetcher
51-
>>> # Basic GET
52-
>>> page = Fetcher.get('https://example.com')
53-
>>> page = Fetcher.get('https://scrapling.requestcatcher.com/get', stealthy_headers=True)
54-
>>> page = Fetcher.get('https://scrapling.requestcatcher.com/get', proxy='http://username:password@localhost:8030')
55-
>>> # With parameters
56-
>>> page = Fetcher.get('https://example.com/search', params={'q': 'query'})
57-
>>>
58-
>>> # With headers
59-
>>> page = Fetcher.get('https://example.com', headers={'User-Agent': 'Custom/1.0'})
60-
>>> # Basic HTTP authentication
61-
>>> page = Fetcher.get("https://example.com", auth=("my_user", "password123"))
62-
>>> # Browser impersonation
63-
>>> page = Fetcher.get('https://example.com', impersonate='chrome')
64-
>>> # HTTP/3 support
65-
>>> page = Fetcher.get('https://example.com', http3=True)
50+
from scrapling.fetchers import Fetcher
51+
# Basic GET
52+
page = Fetcher.get('https://example.com')
53+
page = Fetcher.get('https://scrapling.requestcatcher.com/get', stealthy_headers=True)
54+
page = Fetcher.get('https://scrapling.requestcatcher.com/get', proxy='http://username:password@localhost:8030')
55+
# With parameters
56+
page = Fetcher.get('https://example.com/search', params={'q': 'query'})
57+
58+
# With headers
59+
page = Fetcher.get('https://example.com', headers={'User-Agent': 'Custom/1.0'})
60+
# Basic HTTP authentication
61+
page = Fetcher.get("https://example.com", auth=("my_user", "password123"))
62+
# Browser impersonation
63+
page = Fetcher.get('https://example.com', impersonate='chrome')
64+
# HTTP/3 support
65+
page = Fetcher.get('https://example.com', http3=True)
6666
```
6767
And for asynchronous requests, it's a small adjustment
6868
```python
69-
>>> from scrapling.fetchers import AsyncFetcher
70-
>>> # Basic GET
71-
>>> page = await AsyncFetcher.get('https://example.com')
72-
>>> page = await AsyncFetcher.get('https://scrapling.requestcatcher.com/get', stealthy_headers=True)
73-
>>> page = await AsyncFetcher.get('https://scrapling.requestcatcher.com/get', proxy='http://username:password@localhost:8030')
74-
>>> # With parameters
75-
>>> page = await AsyncFetcher.get('https://example.com/search', params={'q': 'query'})
76-
>>>
77-
>>> # With headers
78-
>>> page = await AsyncFetcher.get('https://example.com', headers={'User-Agent': 'Custom/1.0'})
79-
>>> # Basic HTTP authentication
80-
>>> page = await AsyncFetcher.get("https://example.com", auth=("my_user", "password123"))
81-
>>> # Browser impersonation
82-
>>> page = await AsyncFetcher.get('https://example.com', impersonate='chrome110')
83-
>>> # HTTP/3 support
84-
>>> page = await AsyncFetcher.get('https://example.com', http3=True)
69+
from scrapling.fetchers import AsyncFetcher
70+
# Basic GET
71+
page = await AsyncFetcher.get('https://example.com')
72+
page = await AsyncFetcher.get('https://scrapling.requestcatcher.com/get', stealthy_headers=True)
73+
page = await AsyncFetcher.get('https://scrapling.requestcatcher.com/get', proxy='http://username:password@localhost:8030')
74+
# With parameters
75+
page = await AsyncFetcher.get('https://example.com/search', params={'q': 'query'})
76+
77+
# With headers
78+
page = await AsyncFetcher.get('https://example.com', headers={'User-Agent': 'Custom/1.0'})
79+
# Basic HTTP authentication
80+
page = await AsyncFetcher.get("https://example.com", auth=("my_user", "password123"))
81+
# Browser impersonation
82+
page = await AsyncFetcher.get('https://example.com', impersonate='chrome110')
83+
# HTTP/3 support
84+
page = await AsyncFetcher.get('https://example.com', http3=True)
8585
```
8686
The `page` object in all cases is a [Response](choosing.md#response-object) object, which is a [Selector](parsing/main_classes.md#selector), so you can use it directly
8787
```python
@@ -102,62 +102,62 @@ The `page` object in all cases is a [Response](choosing.md#response-object) obje
102102
```
103103
#### POST
104104
```python
105-
>>> from scrapling.fetchers import Fetcher
106-
>>> # Basic POST
107-
>>> page = Fetcher.post('https://scrapling.requestcatcher.com/post', data={'key': 'value'}, params={'q': 'query'})
108-
>>> page = Fetcher.post('https://scrapling.requestcatcher.com/post', data={'key': 'value'}, stealthy_headers=True)
109-
>>> page = Fetcher.post('https://scrapling.requestcatcher.com/post', data={'key': 'value'}, proxy='http://username:password@localhost:8030', impersonate="chrome")
110-
>>> # Another example of form-encoded data
111-
>>> page = Fetcher.post('https://example.com/submit', data={'username': 'user', 'password': 'pass'}, http3=True)
112-
>>> # JSON data
113-
>>> page = Fetcher.post('https://example.com/api', json={'key': 'value'})
105+
from scrapling.fetchers import Fetcher
106+
# Basic POST
107+
page = Fetcher.post('https://scrapling.requestcatcher.com/post', data={'key': 'value'}, params={'q': 'query'})
108+
page = Fetcher.post('https://scrapling.requestcatcher.com/post', data={'key': 'value'}, stealthy_headers=True)
109+
page = Fetcher.post('https://scrapling.requestcatcher.com/post', data={'key': 'value'}, proxy='http://username:password@localhost:8030', impersonate="chrome")
110+
# Another example of form-encoded data
111+
page = Fetcher.post('https://example.com/submit', data={'username': 'user', 'password': 'pass'}, http3=True)
112+
# JSON data
113+
page = Fetcher.post('https://example.com/api', json={'key': 'value'})
114114
```
115115
And for asynchronous requests, it's a small adjustment
116116
```python
117-
>>> from scrapling.fetchers import AsyncFetcher
118-
>>> # Basic POST
119-
>>> page = await AsyncFetcher.post('https://scrapling.requestcatcher.com/post', data={'key': 'value'})
120-
>>> page = await AsyncFetcher.post('https://scrapling.requestcatcher.com/post', data={'key': 'value'}, stealthy_headers=True)
121-
>>> page = await AsyncFetcher.post('https://scrapling.requestcatcher.com/post', data={'key': 'value'}, proxy='http://username:password@localhost:8030', impersonate="chrome")
122-
>>> # Another example of form-encoded data
123-
>>> page = await AsyncFetcher.post('https://example.com/submit', data={'username': 'user', 'password': 'pass'}, http3=True)
124-
>>> # JSON data
125-
>>> page = await AsyncFetcher.post('https://example.com/api', json={'key': 'value'})
117+
from scrapling.fetchers import AsyncFetcher
118+
# Basic POST
119+
page = await AsyncFetcher.post('https://scrapling.requestcatcher.com/post', data={'key': 'value'})
120+
page = await AsyncFetcher.post('https://scrapling.requestcatcher.com/post', data={'key': 'value'}, stealthy_headers=True)
121+
page = await AsyncFetcher.post('https://scrapling.requestcatcher.com/post', data={'key': 'value'}, proxy='http://username:password@localhost:8030', impersonate="chrome")
122+
# Another example of form-encoded data
123+
page = await AsyncFetcher.post('https://example.com/submit', data={'username': 'user', 'password': 'pass'}, http3=True)
124+
# JSON data
125+
page = await AsyncFetcher.post('https://example.com/api', json={'key': 'value'})
126126
```
127127
#### PUT
128128
```python
129-
>>> from scrapling.fetchers import Fetcher
130-
>>> # Basic PUT
131-
>>> page = Fetcher.put('https://example.com/update', data={'status': 'updated'})
132-
>>> page = Fetcher.put('https://example.com/update', data={'status': 'updated'}, stealthy_headers=True, impersonate="chrome")
133-
>>> page = Fetcher.put('https://example.com/update', data={'status': 'updated'}, proxy='http://username:password@localhost:8030')
134-
>>> # Another example of form-encoded data
135-
>>> page = Fetcher.put("https://scrapling.requestcatcher.com/put", data={'key': ['value1', 'value2']})
129+
from scrapling.fetchers import Fetcher
130+
# Basic PUT
131+
page = Fetcher.put('https://example.com/update', data={'status': 'updated'})
132+
page = Fetcher.put('https://example.com/update', data={'status': 'updated'}, stealthy_headers=True, impersonate="chrome")
133+
page = Fetcher.put('https://example.com/update', data={'status': 'updated'}, proxy='http://username:password@localhost:8030')
134+
# Another example of form-encoded data
135+
page = Fetcher.put("https://scrapling.requestcatcher.com/put", data={'key': ['value1', 'value2']})
136136
```
137137
And for asynchronous requests, it's a small adjustment
138138
```python
139-
>>> from scrapling.fetchers import AsyncFetcher
140-
>>> # Basic PUT
141-
>>> page = await AsyncFetcher.put('https://example.com/update', data={'status': 'updated'})
142-
>>> page = await AsyncFetcher.put('https://example.com/update', data={'status': 'updated'}, stealthy_headers=True, impersonate="chrome")
143-
>>> page = await AsyncFetcher.put('https://example.com/update', data={'status': 'updated'}, proxy='http://username:password@localhost:8030')
144-
>>> # Another example of form-encoded data
145-
>>> page = await AsyncFetcher.put("https://scrapling.requestcatcher.com/put", data={'key': ['value1', 'value2']})
139+
from scrapling.fetchers import AsyncFetcher
140+
# Basic PUT
141+
page = await AsyncFetcher.put('https://example.com/update', data={'status': 'updated'})
142+
page = await AsyncFetcher.put('https://example.com/update', data={'status': 'updated'}, stealthy_headers=True, impersonate="chrome")
143+
page = await AsyncFetcher.put('https://example.com/update', data={'status': 'updated'}, proxy='http://username:password@localhost:8030')
144+
# Another example of form-encoded data
145+
page = await AsyncFetcher.put("https://scrapling.requestcatcher.com/put", data={'key': ['value1', 'value2']})
146146
```
147147

148148
#### DELETE
149149
```python
150-
>>> from scrapling.fetchers import Fetcher
151-
>>> page = Fetcher.delete('https://example.com/resource/123')
152-
>>> page = Fetcher.delete('https://example.com/resource/123', stealthy_headers=True, impersonate="chrome")
153-
>>> page = Fetcher.delete('https://example.com/resource/123', proxy='http://username:password@localhost:8030')
150+
from scrapling.fetchers import Fetcher
151+
page = Fetcher.delete('https://example.com/resource/123')
152+
page = Fetcher.delete('https://example.com/resource/123', stealthy_headers=True, impersonate="chrome")
153+
page = Fetcher.delete('https://example.com/resource/123', proxy='http://username:password@localhost:8030')
154154
```
155155
And for asynchronous requests, it's a small adjustment
156156
```python
157-
>>> from scrapling.fetchers import AsyncFetcher
158-
>>> page = await AsyncFetcher.delete('https://example.com/resource/123')
159-
>>> page = await AsyncFetcher.delete('https://example.com/resource/123', stealthy_headers=True, impersonate="chrome")
160-
>>> page = await AsyncFetcher.delete('https://example.com/resource/123', proxy='http://username:password@localhost:8030')
157+
from scrapling.fetchers import AsyncFetcher
158+
page = await AsyncFetcher.delete('https://example.com/resource/123')
159+
page = await AsyncFetcher.delete('https://example.com/resource/123', stealthy_headers=True, impersonate="chrome")
160+
page = await AsyncFetcher.delete('https://example.com/resource/123', proxy='http://username:password@localhost:8030')
161161
```
162162

163163
## Session Management

agent-skill/Scrapling-Skill/references/fetching/stealthy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
You have one primary way to import this Fetcher, which is the same for all fetchers.
77

88
```python
9-
>>> from scrapling.fetchers import StealthyFetcher
9+
from scrapling.fetchers import StealthyFetcher
1010
```
1111
Check out how to configure the parsing options [here](choosing.md#parser-configuration-in-all-fetchers)
1212

agent-skill/Scrapling-Skill/references/parsing/adaptive.md

Lines changed: 20 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -68,22 +68,21 @@ To extract the Questions button from the old design, a selector like `#hmenus >
6868

6969
Testing the same selector in both versions:
7070
```python
71-
>> from scrapling import Fetcher
72-
>> selector = '#hmenus > div:nth-child(1) > ul > li:nth-child(1) > a'
73-
>> old_url = "https://web.archive.org/web/20100102003420/http://stackoverflow.com/"
74-
>> new_url = "https://stackoverflow.com/"
75-
>> Fetcher.configure(adaptive = True, adaptive_domain='stackoverflow.com')
76-
>>
77-
>> page = Fetcher.get(old_url, timeout=30)
78-
>> element1 = page.css(selector, auto_save=True)[0]
79-
>>
80-
>> # Same selector but used in the updated website
81-
>> page = Fetcher.get(new_url)
82-
>> element2 = page.css(selector, adaptive=True)[0]
83-
>>
84-
>> if element1.text == element2.text:
71+
from scrapling import Fetcher
72+
selector = '#hmenus > div:nth-child(1) > ul > li:nth-child(1) > a'
73+
old_url = "https://web.archive.org/web/20100102003420/http://stackoverflow.com/"
74+
new_url = "https://stackoverflow.com/"
75+
Fetcher.configure(adaptive = True, adaptive_domain='stackoverflow.com')
76+
77+
page = Fetcher.get(old_url, timeout=30)
78+
element1 = page.css(selector, auto_save=True)[0]
79+
80+
# Same selector but used in the updated website
81+
page = Fetcher.get(new_url)
82+
element2 = page.css(selector, adaptive=True)[0]
83+
84+
if element1.text == element2.text:
8585
... print('Scrapling found the same element in the old and new designs!')
86-
'Scrapling found the same element in the old and new designs!'
8786
```
8887
The `adaptive_domain` argument is used here because Scrapling sees `archive.org` and `stackoverflow.com` as two different domains and would isolate their `adaptive` data. Passing `adaptive_domain` tells Scrapling to treat them as the same website for adaptive data storage.
8988

@@ -127,11 +126,11 @@ First, enable the `adaptive` feature by passing `adaptive=True` to the [Selector
127126

128127
Examples:
129128
```python
130-
>>> from scrapling import Selector, Fetcher
131-
>>> page = Selector(html_doc, adaptive=True)
129+
from scrapling import Selector, Fetcher
130+
page = Selector(html_doc, adaptive=True)
132131
# OR
133-
>>> Fetcher.adaptive = True
134-
>>> page = Fetcher.get('https://example.com')
132+
Fetcher.adaptive = True
133+
page = Fetcher.get('https://example.com')
135134
```
136135
When using the [Selector](main_classes.md#selector) class, pass the URL of the website with the `url` argument so Scrapling can separate the properties saved for each element by domain.
137136

@@ -159,11 +158,11 @@ Elements can be manually saved, retrieved, and relocated within the `adaptive` f
159158

160159
Example of getting an element by text:
161160
```python
162-
>>> element = page.find_by_text('Tipping the Velvet', first_match=True)
161+
element = page.find_by_text('Tipping the Velvet', first_match=True)
163162
```
164163
Save its unique properties using the `save` method. The identifier must be set manually (use a meaningful identifier):
165164
```python
166-
>>> page.save(element, 'my_special_element')
165+
page.save(element, 'my_special_element')
167166
```
168167
Later, retrieve and relocate the element inside the page with `adaptive`:
169168
```python

agent-skill/Scrapling-Skill/references/parsing/main_classes.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -131,14 +131,14 @@ Getting the attributes of the element
131131
```
132132
Access a specific attribute with any of the following
133133
```python
134-
>>> article.attrib['class']
135-
>>> article.attrib.get('class')
136-
>>> article['class'] # new in v0.3
134+
article.attrib['class']
135+
article.attrib.get('class')
136+
article['class'] # new in v0.3
137137
```
138138
Check if the attributes contain a specific attribute with any of the methods below
139139
```python
140-
>>> 'class' in article.attrib
141-
>>> 'class' in article # new in v0.3
140+
'class' in article.attrib
141+
'class' in article # new in v0.3
142142
```
143143
Get the HTML content of the element
144144
```python
@@ -279,13 +279,13 @@ In the [Selector](#selector) class, all methods/properties that should return a
279279
Starting with v0.4, all selection methods consistently return [Selector](#selector)/[Selectors](#selectors) objects, even for text nodes and attribute values. Text nodes (selected via `::text`, `/text()`, `::attr()`, `/@attr`) are wrapped in [Selector](#selector) objects. These text node selectors have `tag` set to `"#text"`, and their `text` property returns the text value. You can still access the text value directly, and all other properties return empty/default values gracefully.
280280

281281
```python
282-
>>> page.css('a::text') # -> Selectors (of text node Selectors)
283-
>>> page.xpath('//a/text()') # -> Selectors
284-
>>> page.css('a::text').get() # -> TextHandler (the first text value)
285-
>>> page.css('a::text').getall() # -> TextHandlers (all text values)
286-
>>> page.css('a::attr(href)') # -> Selectors
287-
>>> page.xpath('//a/@href') # -> Selectors
288-
>>> page.css('.price_color') # -> Selectors
282+
page.css('a::text') # -> Selectors (of text node Selectors)
283+
page.xpath('//a/text()') # -> Selectors
284+
page.css('a::text').get() # -> TextHandler (the first text value)
285+
page.css('a::text').getall() # -> TextHandlers (all text values)
286+
page.css('a::attr(href)') # -> Selectors
287+
page.xpath('//a/@href') # -> Selectors
288+
page.css('.price_color') # -> Selectors
289289
```
290290

291291
### Data extraction methods

agent-skill/Scrapling-Skill/references/parsing/selection.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -346,8 +346,8 @@ It filters all elements in the current page/element in the following order:
346346

347347
### Examples
348348
```python
349-
>>> from scrapling.fetchers import Fetcher
350-
>>> page = Fetcher.get('https://quotes.toscrape.com/')
349+
from scrapling.fetchers import Fetcher
350+
page = Fetcher.get('https://quotes.toscrape.com/')
351351
```
352352
Find all elements with the tag name `div`.
353353
```python

0 commit comments

Comments
 (0)