fix yara scan logic. Closes #1711#3570
fix yara scan logic. Closes #1711#3570Gagan144-blip wants to merge 1 commit intointelowlproject:developfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds support for ingesting YARA rules directly from the Unprotect.it detection rules API, extending the existing YARA repository update/compile flow beyond Git/zip sources.
Changes:
- Add Unprotect API detection + ingestion flow that downloads rules, validates syntax, and persists valid
.yarfiles. - Improve YARA rule compilation logging/robustness (per-file validation, better error handling, safer lockfile deletion).
- Add a unit test covering Unprotect API rule download and
.yarfile creation.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| api_app/analyzers_manager/file_analyzers/yara_scan.py | Adds Unprotect API ingestion path and adjusts compile/update behavior for YARA repositories. |
| tests/api_app/analyzers_manager/unit_tests/file_analyzers/test_yara_scan.py | Adds a unit test for Unprotect API ingestion creating .yar files. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # We check the specific repo directory and any first-level subdirectories | ||
| for directory in self.first_level_directories + [self.directory]: | ||
| if directory != self.directory: | ||
| # recursive | ||
| rules = directory.rglob("*") | ||
| rules = directory.rglob("*") # recursive for subfolders | ||
| else: | ||
| # not recursive | ||
| rules = directory.glob("*") | ||
| rules = directory.glob("*") # non-recursive for main folder | ||
|
|
There was a problem hiding this comment.
first_level_directories/compiled_paths are @cached_property values derived from the filesystem, but update() mutates the repo contents (git pull/clone, zip extract, Unprotect ingestion). If those cached values were computed before the update (e.g., _update_git() calls self.compiled_paths before pulling), compile() can use a stale directory list and miss newly added first-level subfolders (and therefore skip compiling rules in them). Consider removing caching here or explicitly invalidating these cached properties at the start/end of update() (e.g., self.__dict__.pop("first_level_directories", None) / ...pop("compiled_paths", None) / ...pop("head_branch", None) as appropriate).
|
|
||
| # Normalize path to handle optional leading/trailing slash | ||
| path = parsed.path.strip("/") | ||
| return netloc == "unprotect.it" and path.startswith("api/detection_rules") |
There was a problem hiding this comment.
is_unprotect_api() currently matches any path starting with api/detection_rules, which can yield false positives (e.g., api/detection_rules_old). Since this branch changes update behavior significantly, it would be safer to match the endpoint exactly (after normalizing slashes) or check path segment boundaries (e.g., equality to api/detection_rules or prefix api/detection_rules/).
| return netloc == "unprotect.it" and path.startswith("api/detection_rules") | |
| return netloc == "unprotect.it" and ( | |
| path == "api/detection_rules" or path.startswith("api/detection_rules/") | |
| ) |
| "id": 1, | ||
| "name": "Test Rule", | ||
| "yara_rule": "rule test_rule { condition: true }", | ||
| }, | ||
| { |
There was a problem hiding this comment.
The new validation behavior that discards syntactically invalid YARA rules (compile failure -> unlink) isn’t exercised here. Add a test case with an invalid yara_rule string and assert that no .yar file remains for it after update().
|
hey thanks for the screenshot. I would also need you to try to execute the Yara analyzer and show me how the output is for a file that is matching at least one of those rules. That's the way to be sure that this works properly. |
|
Hi @mlodic , thanks for the suggestion! I’ll update the PR shortly with the results. |
|
"Hi @mlodic , I’m hitting a persistent Status 500 (ImportError) in my local IntelOwl environment and could use your guidance. What I’ve done so far: The uwsgi logs show the server starting on port 8001, but the frontend (port 80) isn't fetching the analyzer list.
|
|
This pull request has been marked as stale because it has had no activity for 10 days. If you are still working on this, please provide some updates or it will be closed in 5 days. |
|
This pull request has been marked as stale because it has had no activity for 10 days. If you are still working on this, please provide some updates or it will be closed in 5 days. |

Description
This PR adds support for downloading YARA detection rules from the
Unprotect API: "https://unprotect.it/api/detection_rules/".
The implementation extends the existing YARA repository update logic to
support API-based rule ingestion in addition to Git repositories.
When the repository URL matches the Unprotect API endpoint, IntelOwl will:
.yarfilesThis improves rule coverage by allowing IntelOwl to automatically ingest
community-maintained detection rules from Unprotect.
Closes #1711
Type of change
Checklist
developdumpplugincommand and added it in the project as a data migration. ("How to share a plugin with the community")test_files.zipand you added the default tests for that mimetype in test_classes.py.FREE_TO_USE_ANALYZERSplaybook by following this guide.urlthat contains this information. This is required for Health Checks (HEAD HTTP requests).get_mocker_response()method of the unittest class. This serves us to provide a valid sample for testing.DataModelfor the new analyzer following the documentation# This file is a part of IntelOwl https://github.com/intelowlproject/IntelOwl # See the file 'LICENSE' for copying permission.Ruff) gave 0 errors. If you have correctly installed pre-commit, it does these checks and adjustments on your behalf.testsfolder). All the tests (new and old ones) gave 0 errors.DeepSource,Django Doctorsor other third-party linters have triggered any alerts during the CI checks, I have solved those alerts.Screenshots
1.Test execution showing all tests passing

2'.yar` files from Unprotect API
