Skip to content

Commit 53d70cc

Browse files
authored
Merge pull request #61 from opsdisk/pagodo-v2
Pagodo v2
2 parents 084bfd4 + 07a1af8 commit 53d70cc

14 files changed

Lines changed: 564 additions & 1120 deletions

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,5 @@
11
.venv/
22
__pycache__/
3+
*.log
4+
pagodo_results*.txt
5+
pagodo_results*.json

README.md

Lines changed: 152 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,43 @@
1-
# PaGoDo - Passive Google Dork
1+
# pagodo - Passive Google Dork
22

33
## Introduction
44

5-
pagodo automates Google searching for potentially vulnerable web pages and applications on the Internet. It replaces
5+
`pagodo` automates Google searching for potentially vulnerable web pages and applications on the Internet. It replaces
66
manually performing Google dork searches with a web GUI browser.
77

88
There are 2 parts. The first is `ghdb_scraper.py` that retrieves the latest Google dorks and the second portion is
99
`pagodo.py` that leverages the information gathered by `ghdb_scraper.py`.
1010

11-
HakByte created a video tutorial on using pagodo. It starts around 8 minutes in and you can find it here
12-
<https://www.youtube.com/watch?v=lESeJ3EViCo&t=481s>
11+
The core Google search library now uses the more flexible [yagooglesearch](https://github.com/opsdisk/yagooglesearch)
12+
instead of [googlesearch](https://github.com/MarioVilas/googlesearch). Check out the
13+
[yagooglesearch README](https://github.com/opsdisk/yagooglesearch/blob/master/README.md) for a more in-depth explanation
14+
of the library differences and capabilities.
15+
16+
This version of `pagodo` also supports native HTTP(S) and SOCKS5 application support, so no more wrapping it in a tool
17+
like `proxychains4` if you need proxy support. You can specify multiple proxies to use in a round-robin fashion by
18+
providing a comma separated string of proxies using the `-p` switch.
1319

1420
## What are Google dorks?
1521

1622
Offensive Security maintains the Google Hacking Database (GHDB) found here:
1723
<https://www.exploit-db.com/google-hacking-database>. It is a collection of Google searches, called dorks, that can be
18-
used to find potentially vulnerable boxes or other juicy info that is picked up by Google's search bots.
24+
used to find potentially vulnerable boxes or other juicy info that is picked up by Google's search bots.
25+
26+
## Terms and Conditions
27+
28+
The terms and conditions for `pagodo` are the same terms and conditions found in
29+
[yagooglesearch](https://github.com/opsdisk/yagooglesearch#terms-and-conditions).
30+
31+
This code is supplied as-is and you are fully responsible for how it is used. Scraping Google Search results may
32+
violate their [Terms of Service](https://policies.google.com/terms). Another Python Google search library had some
33+
interesting information/discussion on it:
34+
35+
* [Original issue](https://github.com/aviaryan/python-gsearch/issues/1)
36+
* [A response](https://github.com/aviaryan/python-gsearch/issues/1#issuecomment-365581431>)
37+
* Author created a separate [Terms and Conditions](https://github.com/aviaryan/python-gsearch/blob/master/T_AND_C.md)
38+
* ...that contained link to this [blog](https://benbernardblog.com/web-scraping-and-crawling-are-perfectly-legal-right/)
39+
40+
Google's preferred method is to use their [API](https://developers.google.com/custom-search/v1/overview).
1941

2042
## Installation
2143

@@ -24,7 +46,7 @@ Scripts are written for Python 3.6+. Clone the git repository and install the r
2446
```bash
2547
git clone https://github.com/opsdisk/pagodo.git
2648
cd pagodo
27-
virtualenv -p python3 .venv # If using a virtual environment.
49+
virtualenv -p python3.7 .venv # If using a virtual environment.
2850
source .venv/bin/activate # If using a virtual environment.
2951
pip install -r requirements.txt
3052
```
@@ -35,12 +57,13 @@ To start off, `pagodo.py` needs a list of all the current Google dorks. The rep
3557
the current dorks when the `ghdb_scraper.py` was last run. It's advised to run `ghdb_scraper.py` to get the freshest
3658
data before running `pagodo.py`. The `dorks/` directory contains:
3759

38-
* the `all_google_dorks.txt` file which contains all the Google dorks
39-
* Individual dork category dorks
60+
* the `all_google_dorks.txt` file which contains all the Google dorks, one per line
61+
* the `all_google_dorks.json` file which is the JSON response from GHDB
62+
* Individual category dorks
4063

4164
Dork categories:
4265

43-
```none
66+
```python
4467
categories = {
4568
1: "Footholds",
4669
2: "File Containing Usernames",
@@ -59,27 +82,18 @@ categories = {
5982
}
6083
```
6184

62-
Fortunately, the entire database can be pulled back with 1 HTTP GET request using `ghdb_scraper.py`. You can dump all
63-
dorks to a file, the individual dork categories to separate dork files, or the entire json blob if you want more
64-
contextual data about each dork.
65-
6685
### Using ghdb_scraper.py as a script
6786

68-
To retrieve all dorks:
69-
70-
```bash
71-
python ghdb_scraper.py -j -s
72-
```
73-
74-
To retrieve all dorks and write them to individual categories:
87+
Write all dorks to `all_google_dorks.txt`, `all_google_dorks.json`, and individual categories if you want more
88+
contextual data about each dork.
7589

7690
```bash
77-
python ghdb_scraper.py -i
91+
python ghdb_scraper.py -s -j -i
7892
```
7993

8094
### Using ghdb_scraper as a module
8195

82-
The `ghdb_scraper.retrieve_google_dorks()` returns a dictionary with the following data structure:
96+
The `ghdb_scraper.retrieve_google_dorks()` function returns a dictionary with the following data structure:
8397

8498
```python
8599
ghdb_dict = {
@@ -105,75 +119,132 @@ dorks["category_dict"].keys()
105119
dorks["category_dict"][1]["category_name"]
106120
```
107121

108-
## pagodo.py
122+
## <span>pagodo.py</span>
109123

110-
Now that a file with the most recent Google dorks exists, it can be fed into `pagodo.py` using the `-g` switch to start
111-
collecting potentially vulnerable public applications. `pagodo.py` leverages the `google` python library to search
112-
Google for sites with the Google dork, such as:
124+
### Using <span>pagodo.py</span> as a script
113125

114-
```none
115-
intitle:"ListMail Login" admin -demo
126+
```bash
127+
python pagodo.py -d example.com -g dorks.txt
116128
```
117129

118-
The `-d` switch can be used to specify a domain and functions as the Google search operator:
130+
### Using pagodo as a module
119131

120-
```none
121-
site:example.com
122-
```
132+
The `pagodo.Pagodo.go()` function returns a dictionary with the data structure below (dorks used are made up examples):
123133

124-
Performing ~4600 search requests to Google as fast as possible will simply not work. Google will rightfully detect it
125-
as a bot and block your IP for a set period of time. In order to make the search queries appear more human, a couple of
126-
enhancements have been made. A pull request was made and accepted by the maintainer of the Python `google` module to
127-
allow for User-Agent randomization in the Google search queries. This feature is available in
128-
[1.9.3](https://pypi.python.org/pypi/google) and allows you to randomize the different user agents used for each search.
129-
This emulates the different browsers used in a large corporate environment.
134+
```python
135+
{
136+
"dorks": {
137+
"inurl:admin": {
138+
"urls_size": 3,
139+
"urls": [
140+
"https://github.com/marmelab/ng-admin",
141+
"https://github.com/settings/admin",
142+
"https://github.com/akveo/ngx-admin",
143+
],
144+
},
145+
"inurl:gist": {
146+
"urls_size": 3,
147+
"urls": [
148+
"https://gist.github.com/",
149+
"https://gist.github.com/index",
150+
"https://github.com/defunkt/gist",
151+
],
152+
},
153+
},
154+
"initiation_timestamp": "2021-08-27T11:35:30.638705",
155+
"completion_timestamp": "2021-08-27T11:36:42.349035",
156+
}
157+
```
130158

131-
The second enhancement focuses on randomizing the time between search queries. A minimum delay is specified using the
132-
`-e` option and a jitter factor is used to add time on to the minimum delay number. A list of 50 jitter times is created
133-
and one is randomly appended to the minimum delay time for each Google dork search.
159+
Using a Python shell (like `python` or `ipython`) to explore the data:
134160

135161
```python
136-
# Create an array of jitter values to add to delay, favoring longer search times.
137-
self.jitter = numpy.random.uniform(low=self.delay, high=jitter * self.delay, size=(50,))
162+
import pagodo
163+
164+
pg = pagodo.Pagodo(
165+
google_dorks_file="dorks.txt",
166+
domain="github.com",
167+
max_search_result_urls_to_return_per_dork=3,
168+
save_pagodo_results_to_json_file=True,
169+
save_urls_to_file=True,
170+
verbosity=5,
171+
)
172+
pagodo_results_dict = pg.go()
173+
174+
pagodo_results_dict.keys()
175+
176+
pagodo_results_dict["initiation_timestamp"]
177+
178+
pagodo_results_dict["completion_timestamp"]
179+
180+
for key,value in pagodo_results_dict["dorks"].items():
181+
print(f"dork: {key}")
182+
for url in value["urls"]:
183+
print(url)
138184
```
139185

140-
Latter in the script, a random time is selected from the jitter array and added to the delay.
186+
## Tuning Results
141187

142-
```python
143-
pause_time = self.delay + random.choice(self.jitter)
188+
## Scope to a specific domain
189+
190+
The `-d` switch can be used to scope the results to a specific domain and functions as the Google search operator:
191+
192+
```none
193+
site:github.com
144194
```
145195

146-
Experiment with the values, but the defaults successfully worked without Google blocking my IP. Note that it could take
147-
a few days (3 on average) to run so be sure you have the time.
196+
### Wait time between Google dork searchers
197+
198+
* `-i` - Specify the **minimum** delay between dork searches, in seconds. Don't make this too small, or your IP will
199+
get HTTP 429'd quickly.
200+
* `-x` - Specify the **maximum** delay between dork searches, in seconds. Don't make this too big or the searches will
201+
take a long time.
202+
203+
The values provided by `-i` and `-x` are used to generate a list of 20 randomly wait times, that are randomly selected
204+
between each different Google dork search.
205+
206+
### Number of results to return
207+
208+
`-m` - The total max search results to return per Google dork. Each Google search request can pull back at most 100
209+
results at a time, so if you pick `-m 500`, 5 separate search queries will have to be made for each Google dork search,
210+
which will increase the amount of time to complete.
211+
212+
## Google is blocking me!
213+
214+
Performing 6500+ search requests to Google as fast as possible will simply not work. Google will rightfully detect it
215+
as a bot and block your IP for a set period of time. One solution is to use a bank of HTTP(S)/SOCKS proxies and pass
216+
them to `pagodo`
217+
218+
### Native proxy support
148219

149-
To run it:
220+
Pass a comma separated string of proxies to `pagodo` using the `-p` switch.
150221

151222
```bash
152-
python3 pagodo.py -d example.com -g dorks.txt -l 50 -s -e 35.0 -j 1.1
223+
python pagodo.py -g dorks.txt -p https://myproxy:8080,socks5h://127.0.0.1:9050,socks5h://127.0.0.1:9051
153224
```
154225

155-
## Google is blocking me!
226+
You could even decrease the `-i` and `-x` values because you will be leveraging different proxy IPs. The proxies passed
227+
to `pagodo` are selected by round robin.
228+
229+
### proxychains4 support
156230

157-
If you start getting HTTP 429 errors, Google has rightfully detected you as a bot and will block your IP for a set
158-
period of time. The solution is to use proxychains and a bank of proxies to round robin the lookups.
231+
Another solution is to use `proxychains4` to round robin the lookups.
159232

160-
Install proxychains4
233+
Install `proxychains4`
161234

162235
```bash
163236
apt install proxychains4 -y
164237
```
165238

166239
Edit the `/etc/proxychains4.conf` configuration file to round robin the look ups through different proxy servers. In
167-
the example below, 2 different dynamic socks proxies have been set up with different local listening ports
168-
(9050 and 9051). Don't know how to utilize SSH and dynamic socks proxies? Do yourself a favor and pick up a copy of
169-
[Cyber Plumber's Handbook and interactive lab](https://gumroad.com/l/cph_book_and_lab) to learn all about Secure Shell
170-
(SSH) tunneling, port redirection, and bending traffic like a boss.
240+
the example below, 2 different dynamic socks proxies have been set up with different local listening ports (9050 and
241+
9051).
171242

172243
```bash
173244
vim /etc/proxychains4.conf
174245
```
175246

176-
```bash
247+
```ini
177248
round_robin
178249
chain_len = 1
179250
proxy_dns
@@ -185,14 +256,30 @@ socks4 127.0.0.1 9050
185256
socks4 127.0.0.1 9051
186257
```
187258

188-
Throw `proxychains4` in front of the Python script and each lookup will go through a different proxy (and thus source
189-
from a different IP). You could even tune down the `-e` delay time because you will be leveraging different proxy boxes.
259+
Throw `proxychains4` in front of the `pagodo.py` script and each *request* lookup will go through a different proxy (and
260+
thus source from a different IP).
190261

191262
```bash
192-
proxychains4 python3 pagodo.py -g ALL_dorks.txt -s -e 17.0 -l 700 -j 1.1
263+
proxychains4 python pagodo.py -g dorks/all_google_dorks.txt -o -s
193264
```
194265

195-
## Conclusion
266+
Note that this may not appear natural to Google if you:
267+
268+
1) Simulate "browsing" to `google.com` from IP #1
269+
2) Make the first search query from IP #2
270+
3) Simulate clicking "Next" to make the second search query from IP #3
271+
4) Simulate clicking "Next to make the third search query from IP #1
272+
273+
For that reason, using the built in `-p` proxy support is preferred because, as stated in the `yagooglesearch`
274+
documentation, the "provided proxy is used for the entire life cycle of the search to make it look more human, instead
275+
of rotating through various proxies for different portions of the search."
276+
277+
## License
278+
279+
Distributed under the GNU General Public License v3.0. See [LICENSE](./LICENSE) for more information.
280+
281+
## Contact
282+
283+
[@opsdisk](https://twitter.com/opsdisk)
196284

197-
Comments, suggestions, and improvements are always welcome. Be sure to follow [@opsdisk](https://twitter.com/opsdisk)
198-
on Twitter for the latest updates.
285+
Project Link: [https://github.com/opsdisk/pagodo](https://github.com/opsdisk/pagodo)

dorks/advisories_and_vulnerabilities.dorks

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1402,7 +1402,7 @@ inurl:"com_jcalpro"
14021402
Powered by Webiz
14031403
inurl:category.php?cate_id=
14041404
CaLogic Calendars V1.2.2
1405-
"Powered by Rock Band CMS 0.10"
1405+
"Powered by Rock Band CMS 0.10"
14061406
Copyright Acme 2008
14071407
"Creative Guestbook"
14081408
"DeeEmm CMS"
@@ -1881,7 +1881,7 @@ FhImage, powered by Flash-here.com
18811881
"is a product of Lussumo"
18821882
inurl:"index.php?name=PNphpBB2"
18831883
"Powered by Online Grades"
1884-
"Powered by ClanTiger"
1884+
"Powered by ClanTiger"
18851885
inurl:/modules/lykos_reviews/
18861886
"Powered By X7 Chat"
18871887
"powered by guestbook script"

dorks/all_google_dorks.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)