You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you want to configure the parser ([Selector class](parsing/main_classes.md#selector)) that will be used on the response before returning it for you, then do this first:
37
37
```python
38
-
>>>from scrapling.fetchers import Fetcher
39
-
>>>Fetcher.configure(adaptive=True, keep_comments=False, keep_cdata=False) # and the rest
38
+
from scrapling.fetchers import Fetcher
39
+
Fetcher.configure(adaptive=True, keep_comments=False, keep_cdata=False) # and the rest
40
40
```
41
41
or
42
42
```python
43
-
>>>from scrapling.fetchers import Fetcher
44
-
>>>Fetcher.adaptive=True
45
-
>>>Fetcher.keep_comments=False
46
-
>>>Fetcher.keep_cdata=False# and the rest
43
+
from scrapling.fetchers import Fetcher
44
+
Fetcher.adaptive=True
45
+
Fetcher.keep_comments=False
46
+
Fetcher.keep_cdata=False# and the rest
47
47
```
48
48
Then, continue your code as usual.
49
49
@@ -59,19 +59,19 @@ If your use case requires a different configuration for each request/fetch, you
59
59
## Response Object
60
60
The `Response` object is the same as the [Selector](parsing/main_classes.md#selector) class, but it has additional details about the response, like response headers, status, cookies, etc., as shown below:
61
61
```python
62
-
>>>from scrapling.fetchers import Fetcher
63
-
>>>page = Fetcher.get('https://example.com')
62
+
from scrapling.fetchers import Fetcher
63
+
page = Fetcher.get('https://example.com')
64
64
65
-
>>>page.status # HTTP status code
66
-
>>>page.reason # Status message
67
-
>>>page.cookies # Response cookies as a dictionary
68
-
>>>page.headers # Response headers
69
-
>>>page.request_headers # Request headers
70
-
>>>page.history # Response history of redirections, if any
71
-
>>>page.body # Raw response body as bytes
72
-
>>>page.encoding # Response encoding
73
-
>>>page.meta # Response metadata dictionary (e.g., proxy used). Mainly helpful with the spiders system.
74
-
>>>page.captured_xhr # List of captured XHR/fetch responses (when capture_xhr is enabled on a browser session)
65
+
page.status # HTTP status code
66
+
page.reason # Status message
67
+
page.cookies # Response cookies as a dictionary
68
+
page.headers # Response headers
69
+
page.request_headers # Request headers
70
+
page.history # Response history of redirections, if any
71
+
page.body # Raw response body as bytes
72
+
page.encoding # Response encoding
73
+
page.meta # Response metadata dictionary (e.g., proxy used). Mainly helpful with the spiders system.
74
+
page.captured_xhr # List of captured XHR/fetch responses (when capture_xhr is enabled on a browser session)
The `page` object in all cases is a [Response](choosing.md#response-object) object, which is a [Selector](parsing/main_classes.md#selector), so you can use it directly
87
87
```python
@@ -102,62 +102,62 @@ The `page` object in all cases is a [Response](choosing.md#response-object) obje
...print('Scrapling found the same element in the old and new designs!')
86
-
'Scrapling found the same element in the old and new designs!'
87
86
```
88
87
The `adaptive_domain` argument is used here because Scrapling sees `archive.org` and `stackoverflow.com` as two different domains and would isolate their `adaptive` data. Passing `adaptive_domain` tells Scrapling to treat them as the same website for adaptive data storage.
89
88
@@ -127,11 +126,11 @@ First, enable the `adaptive` feature by passing `adaptive=True` to the [Selector
127
126
128
127
Examples:
129
128
```python
130
-
>>>from scrapling import Selector, Fetcher
131
-
>>>page = Selector(html_doc, adaptive=True)
129
+
from scrapling import Selector, Fetcher
130
+
page = Selector(html_doc, adaptive=True)
132
131
# OR
133
-
>>>Fetcher.adaptive =True
134
-
>>>page = Fetcher.get('https://example.com')
132
+
Fetcher.adaptive =True
133
+
page = Fetcher.get('https://example.com')
135
134
```
136
135
When using the [Selector](main_classes.md#selector) class, pass the URL of the website with the `url` argument so Scrapling can separate the properties saved for each element by domain.
137
136
@@ -159,11 +158,11 @@ Elements can be manually saved, retrieved, and relocated within the `adaptive` f
159
158
160
159
Example of getting an element by text:
161
160
```python
162
-
>>>element = page.find_by_text('Tipping the Velvet', first_match=True)
161
+
element = page.find_by_text('Tipping the Velvet', first_match=True)
163
162
```
164
163
Save its unique properties using the `save` method. The identifier must be set manually (use a meaningful identifier):
165
164
```python
166
-
>>>page.save(element, 'my_special_element')
165
+
page.save(element, 'my_special_element')
167
166
```
168
167
Later, retrieve and relocate the element inside the page with `adaptive`:
Copy file name to clipboardExpand all lines: agent-skill/Scrapling-Skill/references/parsing/main_classes.md
+12-12Lines changed: 12 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -131,14 +131,14 @@ Getting the attributes of the element
131
131
```
132
132
Access a specific attribute with any of the following
133
133
```python
134
-
>>>article.attrib['class']
135
-
>>>article.attrib.get('class')
136
-
>>>article['class'] # new in v0.3
134
+
article.attrib['class']
135
+
article.attrib.get('class')
136
+
article['class'] # new in v0.3
137
137
```
138
138
Check if the attributes contain a specific attribute with any of the methods below
139
139
```python
140
-
>>>'class'in article.attrib
141
-
>>>'class'in article # new in v0.3
140
+
'class'in article.attrib
141
+
'class'in article # new in v0.3
142
142
```
143
143
Get the HTML content of the element
144
144
```python
@@ -279,13 +279,13 @@ In the [Selector](#selector) class, all methods/properties that should return a
279
279
Starting with v0.4, all selection methods consistently return [Selector](#selector)/[Selectors](#selectors) objects, even for text nodes and attribute values. Text nodes (selected via `::text`, `/text()`, `::attr()`, `/@attr`) are wrapped in [Selector](#selector) objects. These text node selectors have `tag` set to `"#text"`, and their `text` property returns the text value. You can still access the text value directly, and all other properties return empty/default values gracefully.
280
280
281
281
```python
282
-
>>>page.css('a::text') # -> Selectors (of text node Selectors)
283
-
>>>page.xpath('//a/text()') # -> Selectors
284
-
>>>page.css('a::text').get() # -> TextHandler (the first text value)
285
-
>>>page.css('a::text').getall() # -> TextHandlers (all text values)
286
-
>>>page.css('a::attr(href)') # -> Selectors
287
-
>>>page.xpath('//a/@href') # -> Selectors
288
-
>>>page.css('.price_color') # -> Selectors
282
+
page.css('a::text') # -> Selectors (of text node Selectors)
283
+
page.xpath('//a/text()') # -> Selectors
284
+
page.css('a::text').get() # -> TextHandler (the first text value)
285
+
page.css('a::text').getall() # -> TextHandlers (all text values)
0 commit comments