11 <pre >
22
33
4- ████████╗ ██████╗ ██████╗ ██████╗ ██████╗ ████████╗
5- ╚══██╔══╝██╔═══██╗██╔══██╗ ██╔══██╗██╔═████╗╚══██╔══╝
6- ██║ ██║ ██║██████╔╝ ██████╔╝██║██╔██║ ██║
7- ██║ ██║ ██║██╔══██╗ ██╔══██╗████╔╝██║ ██║
8- ██║ ╚██████╔╝██║ ██║ ██████╔╝╚██████╔╝ ██║
9- ╚═╝ ╚═════╝ ╚═╝ ╚═╝ ╚═════╝ ╚═════╝ ╚═╝
4+ ████████╗ ██████╗ ██████╗ ██████╗ ██████╗ ████████╗
5+ ╚══██╔══╝██╔═══██╗██╔══██╗ ██╔══██╗██╔═████╗╚══██╔══╝
6+ ██║ ██║ ██║██████╔╝ ██████╔╝██║██╔██║ ██║
7+ ██║ ██║ ██║██╔══██╗ ██╔══██╗████╔╝██║ ██║
8+ ██║ ╚██████╔╝██║ ██║ ██████╔╝╚██████╔╝ ██║
9+ ╚═╝ ╚═════╝ ╚═╝ ╚═╝ ╚═════╝ ╚═════╝ ╚═╝
1010
11-
12-
13- `.` `
14- ``.:.--.`
15- .-+++/-`
16- `+sso:`
17- `` /yy+.
18- -+.oho.
19- o../+y
20- -s.-/:y:`
21- .:o+-`--::oo/-`
22- `/o+:.```---///oss+-
23- .+o:.``...`-::-+++++sys-
24- :y/```....``--::-yooooosh+
25- -h-``--.```..-:-::ssssssssd+
26- h:``:.``....`--:-++hsssyyyym.
27- .d.`/.``--.```:--//odyyyyyyym/
28- `d.`+``:.```.--/-+/smyyhhhhhm:
29- os`./`/````/`-/:+oydhhhhhhdh`
30- `so.-/-:``./`.//osmddddddmd.
31- /s/-/:/.`/..+/ydmdddddmo`
32- `:oosso/:+/syNmddmdy/.
33- `-/++oosyso+/.`
34-
35-
36- ██████╗ ███████╗██████╗ ███████╗██████╗ ██████╗ ██╗███╗ ██╗███████╗██╗██████╗ ███████╗
37- ██╔══██╗██╔════╝██╔══██╗██╔════╝╚════██╗██╔════╝ ██║████╗ ██║██╔════╝██║██╔══██╗██╔════╝
38- ██║ ██║█████╗ ██║ ██║███████╗ █████╔╝██║ ██║██╔██╗ ██║███████╗██║██║ ██║█████╗
39- ██║ ██║██╔══╝ ██║ ██║╚════██║ ╚═══██╗██║ ██║██║╚██╗██║╚════██║██║██║ ██║██╔══╝
40- ██████╔╝███████╗██████╔╝███████║██████╔╝╚██████╗ ██║██║ ╚████║███████║██║██████╔╝███████╗
41- ╚═════╝ ╚══════╝╚═════╝ ╚══════╝╚═════╝ ╚═════╝ ╚═╝╚═╝ ╚═══╝╚══════╝╚═╝╚═════╝ ╚══════╝
11+
12+
13+ `.` `
14+ ``.:.--.`
15+ .-+++/-`
16+ `+sso:`
17+ `` /yy+.
18+ -+.oho.
19+ o../+y
20+ -s.-/:y:`
21+ .:o+-`--::oo/-`
22+ `/o+:.```---///oss+-
23+ .+o:.``...`-::-+++++sys-
24+ :y/```....``--::-yooooosh+
25+ -h-``--.```..-:-::ssssssssd+
26+ h:``:.``....`--:-++hsssyyyym.
27+ .d.`/.``--.```:--//odyyyyyyym/
28+ `d.`+``:.```.--/-+/smyyhhhhhm:
29+ os`./`/````/`-/:+oydhhhhhhdh`
30+ `so.-/-:``./`.//osmddddddmd.
31+ /s/-/:/.`/..+/ydmdddddmo`
32+ `:oosso/:+/syNmddmdy/.
33+ `-/++oosyso+/.`
34+
35+
36+ ██████╗ ███████╗██████╗ ███████╗██████╗ ██████╗ ██╗███╗ ██╗███████╗██╗██████╗ ███████╗
37+ ██╔══██╗██╔════╝██╔══██╗██╔════╝╚════██╗██╔════╝ ██║████╗ ██║██╔════╝██║██╔══██╗██╔════╝
38+ ██║ ██║█████╗ ██║ ██║███████╗ █████╔╝██║ ██║██╔██╗ ██║███████╗██║██║ ██║█████╗
39+ ██║ ██║██╔══╝ ██║ ██║╚════██║ ╚═══██╗██║ ██║██║╚██╗██║╚════██║██║██║ ██║██╔══╝
40+ ██████╔╝███████╗██████╔╝███████║██████╔╝╚██████╗ ██║██║ ╚████║███████║██║██████╔╝███████╗
41+ ╚═════╝ ╚══════╝╚═════╝ ╚══════╝╚═════╝ ╚═════╝ ╚═╝╚═╝ ╚═══╝╚══════╝╚═╝╚═════╝ ╚══════╝
4242
4343
4444
4545</pre >
4646
4747## A python web crawler for Deep and Dark Web.
4848[ ![ Build Status] ( https://travis-ci.org/DedSecInside/TorBoT.svg?branch=master )] ( https://travis-ci.org/DedSecInside/TorBoT )
49- [ ![ ] ( https://img.shields.io/badge/Donate-Bitcoin-blue.svg?style=flat-square )] ( https://blockchain.info/address/14st7SzDbQZuu8fpQ74x477WoRJ7gpHFaj )
50- [ ![ forthebadge ] ( http ://forthebadge.com/images/badges/built-with-love .svg)] ( http://forthebadge.com )
51- [ ![ forthebadge ] ( http ://forthebadge.com/images/badges/made-with-python .svg)] ( http://forthebadge.com )
49+ [ ![ ] ( https://img.shields.io/badge/Donate-Bitcoin-blue.svg?style=flat )] ( https://blockchain.info/address/14st7SzDbQZuu8fpQ74x477WoRJ7gpHFaj )
50+ [ ![ ] ( https ://img.shields.io/badge/Built%20with-❤-orange .svg?style=flat )] ( )
51+ [ ![ ] ( https ://img.shields.io/badge/Made%20with-Python-red .svg?style=flat )] ( )
5252
5353
5454### Working Procedure/Basic Plan
@@ -65,62 +65,69 @@ the following steps:
65658 . After all URLs are processed, return the most relevant page.
6666
6767### Features
68- 1 . Crawls Tor links (.onion) only.
69- 2 . Returns Page title and address.
70- 3 . Cache links so that there won't be duplicate links.
68+ 1 . Crawls Tor links (.onion).(Completed)
69+ 2 . Returns Page title and address with a short description about the site.(Not Started)
70+ 3 . Save links to database.(Not Started)
71+ 4 . Get emails from site.(Completed)
72+ 5 . Save crawl info to JSON file.(Completed)
73+ 6 . Crawl custom domains.(Completed)
74+ 7 . Check if the link is live.(Not Started)
75+ 8 . Built-in Updater.(Completed)
7176...(will be updated)
7277
7378## Contribute
7479Contributions to this project are always welcome.
75- To add a new feature fork this repository and give a pull request when your new feature is tested and complete.
80+ To add a new feature fork the dev branch and give a pull request when your new feature is tested and complete.
7681If its a new module, it should be put inside the modules directory and imported to the main file.
7782The branch name should be your new feature name in the format <Feature_featurename_version(optional)>. For example, <i >Feature_FasterCrawl_1.0</i >.
7883Contributor name will be updated to the below list. : D
7984
8085## Dependencies
81861 . Tor
82- 2 . Python 3.x (Make sure pip3 is there )
83- 3 . Python Stem Module
84- 4 . urllib
85- 5 . Beautiful Soup 4
86- 6 . Socket
87- 7 . Sock
88- 8 . Argparse
89- 9 . Stem module
90- 10 . Git
87+ 2 . Python 3.x (Make sure pip3 is installed )
88+ 3 . requests
89+ 4 . Beautiful Soup 4
90+ 5 . Socket
91+ 6 . Sock
92+ 7 . Argparse
93+ 8 . Git
94+ 9 . termcolor
95+ 10 . tldextract
9196
9297## Basic setup
9398Before you run the torBot make sure the following things are done properly:
9499
95100* Run tor service
96101` sudo service tor start `
97102
98- * Set a password for tor
99- ` tor --hash-password "my_password" `
100-
101- * Give the password inside torbot.py
102- `from stem.control import Controller
103- with Controller.from_port(port = 9051) as controller:
104- controller.authenticate("your_password_hash")
105- controller.signal(Signal.NEWNYM)`
103+ * Make sure that your torrc is configured to SOCKS_PORT localhost:9050
106104
107- ` python3 torBot.py `
108- `usage: torBot.py [ -h] [ -q] [ -u URL] [ -m] [ -e EXTENSION] [ -l]
105+ ` python3 torBot.py or use the -h/--help argument `
106+ <pre >
107+ `usage: torBot.py [-h] [-v] [--update] [-q] [-u URL] [-s] [-m] [-e EXTENSION]
108+ [-l] [-i]
109109
110110optional arguments:
111- -h, --help show this help message and exit
112- -q, --quiet
113- -u URL, --url URL Specifiy a website link to crawl
111+ -h, --help Show this help message and exit
112+ -v, --version Show current version of TorBot.
113+ --update Update TorBot to the latest stable version
114+ -q, --quiet Prevent header from displaying
115+ -u URL, --url URL Specifiy a website link to crawl, currently returns links on that page
116+ -s, --save Save results to a file in json format
114117 -m, --mail Get e-mail addresses from the crawled sites
115118 -e EXTENSION, --extension EXTENSION
116119 Specifiy additional website extensions to the
117120 list(.com or .org etc)
118- -l, --live Check if websites are live or not (slow)`
121+ -l, --live Check if websites are live or not (slow)
122+ -i, --info Info displays basic info of the scanned site (very
123+ slow)` </pre >
124+
125+ * NOTE: All flags under -u URL, --url URL must also be passed a -u flag.
119126
120127Read more about torrc here : [ Torrc] ( https://github.com/DedSecInside/TorBoT/blob/master/Tor.md )
121128
122129## TO-DO
123- A TO-DO list will be added here as soon as its complete.
130+ - [ ] Implement A \* Search for webcrawler
124131
125132### Have ideas?
126133If you have new ideas which is worth implementing, mention those by starting a new issue with the title [ FEATURE_REQUEST] .
@@ -133,7 +140,11 @@ GNU Public License
133140
134141- [X] [ P5N4PPZ] ( https://github.com/PSNAppz ) - Owner
135142- [X] [ agrepravin] ( https://github.com/agrepravin ) - Contributor,Reviewer
136- - [X] [ y-mehta] ( https://github.com/y-mehta ) - Contributer
143+ - [X] [ y-mehta] ( https://github.com/y-mehta ) - Contributor
144+ - [X] [ Manfredi Martorana] ( https://github.com/Agostinelli ) - Contributor
145+ - [X] [ KingAkeem] ( https://github.com/KingAkeem ) - Contributor
146+ - [X] [ Evan Sia Wai Suan] ( https://github.com/waisuan ) - New Contributor
147+
137148
138149![ ] ( https://upload.wikimedia.org/wikipedia/commons/thumb/4/42/Opensource.svg/200px-Opensource.svg.png )
139150
0 commit comments