You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<!-- Describe the overall picture of your modifications to help maintainers understand the pull request. PRs are required to be associated to their related issue tickets or feature request. -->
4
+
5
+
### Proof
6
+
7
+
<!-- How has this been tested? Please describe the tests that you ran to verify your changes. -->
8
+
9
+
## Checklist
10
+
11
+
<!-- Put an "x" in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code. -->
12
+
13
+
-[ ] Pull request is created against the [dev](https://github.com/projectdiscovery/katana/tree/dev) branch
14
+
-[ ] All checks passed (lint, unit/integration/regression tests etc.) with my changes
15
+
-[ ] I have added tests that prove my fix is effective or that my feature works
16
+
-[ ] I have added necessary documentation (if appropriate)
Copy file name to clipboardExpand all lines: README.md
+55-1Lines changed: 55 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,7 +40,7 @@
40
40
41
41
## Installation
42
42
43
-
katana requires Go 1.24+ to install successfully. If you encounter any installation issues, we recommend trying with the latest available version of Go, as the minimum required version may have changed. Run the command below or download a pre-compiled binary from the [release page](https://github.com/projectdiscovery/katana/releases).
43
+
katana requires Go 1.25+ to install successfully. If you encounter any installation issues, we recommend trying with the latest available version of Go, as the minimum required version may have changed. Run the command below or download a pre-compiled binary from the [release page](https://github.com/projectdiscovery/katana/releases).
44
44
45
45
```console
46
46
CGO_ENABLED=1 go install github.com/projectdiscovery/katana/cmd/katana@latest
@@ -138,8 +138,11 @@ CONFIGURATION:
138
138
-flc, -field-config string path to custom field configuration file
Katana supports automatic captcha detection and solving during headless crawling. When a captcha page is encountered, katana identifies the captcha provider, solves it via an external service, and continues crawling.
The provider and key can also be set via environment variables:
393
+
394
+
```console
395
+
export CAPTCHA_SOLVER_PROVIDER=capsolver
396
+
export CAPTCHA_SOLVER_KEY=YOUR_API_KEY
397
+
katana -u https://example.com -headless
398
+
```
399
+
364
400
## Scope Control
365
401
366
402
Crawling can be endless if not scoped, as such katana comes with multiple support to define the crawl scope.
@@ -509,6 +545,21 @@ Automatic form filling is experimental feature.
509
545
katana -u https://tesla.com -aff
510
546
```
511
547
548
+
*`-filter-similar`*
549
+
----
550
+
551
+
Option to filter crawling of similar looking URLs by normalizing variable path segments. This detects IDs, UUIDs, hashes, dates, and other dynamic values, and also learns repeating patterns at runtime. For example, `/users/123` and `/users/456` are treated as the same endpoint.
552
+
553
+
```
554
+
katana -u https://tesla.com -fsu
555
+
```
556
+
557
+
The promotion threshold (how many distinct values at a path position before it's treated as a parameter) can be tuned with `-fst`. Lower values are more aggressive (fewer URLs crawled), higher values are more permissive. Default is `10`.
558
+
559
+
```
560
+
katana -u https://tesla.com -fsu -fst 5
561
+
```
562
+
512
563
## Authenticated Crawling
513
564
514
565
Authenticated crawling involves including custom headers or cookies in HTTP requests to access protected resources. These headers provide authentication or authorization information, allowing you to crawl authenticated content / endpoint. You can specify headers directly in the command line or provide them as a file with katana to perform authenticated crawling.
@@ -564,6 +615,9 @@ CONFIGURATION:
564
615
-fc, -form-config string path to custom form configuration file
565
616
-flc, -field-config string path to custom field configuration file
flagSet.BoolVarP(&options.IgnoreQueryParams, "ignore-query-params", "iqp", false, "Ignore crawling same path with different query-param values"),
169
+
flagSet.BoolVarP(&options.FilterSimilar, "filter-similar", "fsu", false, "filter crawling of similar looking URLs (e.g., /users/123 and /users/456)"),
170
+
flagSet.IntVarP(&options.FilterSimilarThreshold, "filter-similar-threshold", "fst", 10, "number of distinct values before a path position is treated as parameter (default 10)"),
0 commit comments