22
33## Introduction
44
5- ` pagodo ` automates Google searching for potentially vulnerable web pages and applications on the Internet. It replaces
5+ ` pagodo ` automates Google searching for potentially vulnerable web pages and applications on the Internet. It replaces
66manually performing Google dork searches with a web GUI browser.
77
8- There are 2 parts. The first is ` ghdb_scraper.py ` that retrieves the latest Google dorks and the second portion is
8+ There are 2 parts. The first is ` ghdb_scraper.py ` that retrieves the latest Google dorks and the second portion is
99` pagodo.py ` that leverages the information gathered by ` ghdb_scraper.py ` .
1010
1111The core Google search library now uses the more flexible [ yagooglesearch] ( https://github.com/opsdisk/yagooglesearch )
12- instead of [ googlesearch] ( https://github.com/MarioVilas/googlesearch ) . Check out the [ yagooglesearch
12+ instead of [ googlesearch] ( https://github.com/MarioVilas/googlesearch ) . Check out the [ yagooglesearch
1313README] ( https://github.com/opsdisk/yagooglesearch/blob/master/README.md ) for a more in-depth explanation of the library
1414differences and capabilities.
1515
1616This version of ` pagodo ` also supports native HTTP(S) and SOCKS5 application support, so no more wrapping it in a tool
17- like ` proxychains4 ` if you need proxy support. You can specify multiple proxies to use in a round-robin fashion by
17+ like ` proxychains4 ` if you need proxy support. You can specify multiple proxies to use in a round-robin fashion by
1818providing a comma separated string of proxies using the ` -p ` switch.
1919
2020## What are Google dorks?
2121
2222Offensive Security maintains the Google Hacking Database (GHDB) found here:
23- < https://www.exploit-db.com/google-hacking-database > . It is a collection of Google searches, called dorks, that can be
23+ < https://www.exploit-db.com/google-hacking-database > . It is a collection of Google searches, called dorks, that can be
2424used to find potentially vulnerable boxes or other juicy info that is picked up by Google's search bots.
2525
2626## Terms and Conditions
2727
2828The terms and conditions for ` pagodo ` are the same terms and conditions found in
2929[ yagooglesearch] ( https://github.com/opsdisk/yagooglesearch#terms-and-conditions ) .
3030
31- This code is supplied as-is and you are fully responsible for how it is used. Scraping Google Search results may
32- violate their [ Terms of Service] ( https://policies.google.com/terms ) . Another Python Google search library had some
31+ This code is supplied as-is and you are fully responsible for how it is used. Scraping Google Search results may
32+ violate their [ Terms of Service] ( https://policies.google.com/terms ) . Another Python Google search library had some
3333interesting information/discussion on it:
3434
35- * [ Original issue] ( https://github.com/aviaryan/python-gsearch/issues/1 )
36- * [ A response] ( https://github.com/aviaryan/python-gsearch/issues/1#issuecomment-365581431> )
37- * Author created a separate [ Terms and Conditions] ( https://github.com/aviaryan/python-gsearch/blob/master/T_AND_C.md )
38- * ...that contained link to this [ blog] ( https://benbernardblog.com/web-scraping-and-crawling-are-perfectly-legal-right/ )
35+ - [ Original issue] ( https://github.com/aviaryan/python-gsearch/issues/1 )
36+ - [ A response] ( https://github.com/aviaryan/python-gsearch/issues/1#issuecomment-365581431> )
37+ - Author created a separate [ Terms and Conditions] ( https://github.com/aviaryan/python-gsearch/blob/master/T_AND_C.md )
38+ - ...that contained link to this [ blog] ( https://benbernardblog.com/web-scraping-and-crawling-are-perfectly-legal-right/ )
3939
4040Google's preferred method is to use their [ API] ( https://developers.google.com/custom-search/v1/overview ) .
4141
4242## Installation
4343
44- Scripts are written for Python 3.6+. Clone the git repository and install the requirements.
44+ Scripts are written for Python 3.6+. Clone the git repository and install the requirements.
4545
4646``` bash
4747git clone https://github.com/opsdisk/pagodo.git
@@ -53,13 +53,13 @@ pip install -r requirements.txt
5353
5454## ghdb_scraper.py
5555
56- To start off, ` pagodo.py ` needs a list of all the current Google dorks. The repo contains a ` dorks/ ` directory with the
56+ To start off, ` pagodo.py ` needs a list of all the current Google dorks. The repo contains a ` dorks/ ` directory with the
5757current dorks when the ` ghdb_scraper.py ` was last run. It's advised to run ` ghdb_scraper.py ` to get the freshest data
58- before running ` pagodo.py ` . The ` dorks/ ` directory contains:
58+ before running ` pagodo.py ` . The ` dorks/ ` directory contains:
5959
60- * the ` all_google_dorks.txt ` file which contains all the Google dorks, one per line
61- * the ` all_google_dorks.json ` file which is the JSON response from GHDB
62- * Individual category dorks
60+ - the ` all_google_dorks.txt ` file which contains all the Google dorks, one per line
61+ - the ` all_google_dorks.json ` file which is the JSON response from GHDB
62+ - Individual category dorks
6363
6464Dork categories:
6565
@@ -124,7 +124,7 @@ dorks["category_dict"][1]["category_name"]
124124### Using <span >pagodo.py</span > as a script
125125
126126``` bash
127- python pagodo.py -d example.com -g dorks.txt
127+ python pagodo.py -d example.com -g dorks.txt
128128```
129129
130130### Using pagodo as a module
@@ -195,37 +195,37 @@ site:github.com
195195
196196### Wait time between Google dork searchers
197197
198- * ` -i ` - Specify the ** minimum** delay between dork searches, in seconds. Don't make this too small, or your IP will
199- get HTTP 429'd quickly.
200- * ` -x ` - Specify the ** maximum** delay between dork searches, in seconds. Don't make this too big or the searches will
201- take a long time.
198+ - ` -i ` - Specify the ** minimum** delay between dork searches, in seconds. Don't make this too small, or your IP will
199+ get HTTP 429'd quickly.
200+ - ` -x ` - Specify the ** maximum** delay between dork searches, in seconds. Don't make this too big or the searches will
201+ take a long time.
202202
203203The values provided by ` -i ` and ` -x ` are used to generate a list of 20 randomly wait times, that are randomly selected
204204between each different Google dork search.
205205
206206### Number of results to return
207207
208- ` -m ` - The total max search results to return per Google dork. Each Google search request can pull back at most 100
208+ ` -m ` - The total max search results to return per Google dork. Each Google search request can pull back at most 100
209209results at a time, so if you pick ` -m 500 ` , 5 separate search queries will have to be made for each Google dork search,
210210which will increase the amount of time to complete.
211211
212212### Save Output
213213
214- ` -o [optional/path/to/results.json] ` - Save output to a JSON file. If you do not specify a filename, a datetimestamped
214+ ` -o [optional/path/to/results.json] ` - Save output to a JSON file. If you do not specify a filename, a datetimestamped
215215one will be generated.
216216
217- ` -s [optional/path/to/results.txt] ` - Save URLs to a text file. If you do not specify a filename, a datetimestamped one
217+ ` -s [optional/path/to/results.txt] ` - Save URLs to a text file. If you do not specify a filename, a datetimestamped one
218218will be generated.
219219
220220### Save logs
221221
222- ` --log [optional/path/to/file.log] ` - Save logs to the specified file. If you do not specify a filename, the default
222+ ` --log [optional/path/to/file.log] ` - Save logs to the specified file. If you do not specify a filename, the default
223223file ` pagodo.py.log ` at the root of pagodo directory will be used.
224224
225225## Google is blocking me!
226226
227- Performing 7300+ search requests to Google as fast as possible will simply not work. Google will rightfully detect it
228- as a bot and block your IP for a set period of time. One solution is to use a bank of HTTP(S)/SOCKS proxies and pass
227+ Performing 7300+ search requests to Google as fast as possible will simply not work. Google will rightfully detect it
228+ as a bot and block your IP for a set period of time. One solution is to use a bank of HTTP(S)/SOCKS proxies and pass
229229them to ` pagodo `
230230
231231### Native proxy support
@@ -236,7 +236,7 @@ Pass a comma separated string of proxies to `pagodo` using the `-p` switch.
236236python pagodo.py -g dorks.txt -p http://myproxy:8080,socks5h://127.0.0.1:9050,socks5h://127.0.0.1:9051
237237```
238238
239- You could even decrease the ` -i ` and ` -x ` values because you will be leveraging different proxy IPs. The proxies passed
239+ You could even decrease the ` -i ` and ` -x ` values because you will be leveraging different proxy IPs. The proxies passed
240240to ` pagodo ` are selected by round robin.
241241
242242### proxychains4 support
@@ -249,7 +249,7 @@ Install `proxychains4`
249249apt install proxychains4 -y
250250```
251251
252- Edit the ` /etc/proxychains4.conf ` configuration file to round robin the look ups through different proxy servers. In
252+ Edit the ` /etc/proxychains4.conf ` configuration file to round robin the look ups through different proxy servers. In
253253the example below, 2 different dynamic socks proxies have been set up with different local listening ports (9050 and
2542549051).
255255
@@ -269,7 +269,7 @@ socks4 127.0.0.1 9050
269269socks4 127.0.0.1 9051
270270```
271271
272- Throw ` proxychains4 ` in front of the ` pagodo.py ` script and each * request * lookup will go through a different proxy (and
272+ Throw ` proxychains4 ` in front of the ` pagodo.py ` script and each _ request _ lookup will go through a different proxy (and
273273thus source from a different IP).
274274
275275``` bash
@@ -278,10 +278,10 @@ proxychains4 python pagodo.py -g dorks/all_google_dorks.txt -o [optional/path/to
278278
279279Note that this may not appear natural to Google if you:
280280
281- 1 ) Simulate "browsing" to ` google.com ` from IP #1
282- 2 ) Make the first search query from IP #2
283- 3 ) Simulate clicking "Next" to make the second search query from IP #3
284- 4 ) Simulate clicking "Next to make the third search query from IP #1
281+ 1 . Simulate "browsing" to ` google.com ` from IP #1
282+ 2 . Make the first search query from IP #2
283+ 3 . Simulate clicking "Next" to make the second search query from IP #3
284+ 4 . Simulate clicking "Next to make the third search query from IP #1
285285
286286For that reason, using the built in ` -p ` proxy support is preferred because, as stated in the ` yagooglesearch `
287287documentation, the "provided proxy is used for the entire life cycle of the search to make it look more human, instead
0 commit comments