Skip to content

debezium/dbz#1810 Add automated E2E tests for the auditlog example.#406

Open
KMohnishM wants to merge 1 commit intodebezium:mainfrom
KMohnishM:dbz-1810
Open

debezium/dbz#1810 Add automated E2E tests for the auditlog example.#406
KMohnishM wants to merge 1 commit intodebezium:mainfrom
KMohnishM:dbz-1810

Conversation

@KMohnishM
Copy link
Copy Markdown
Contributor

@KMohnishM KMohnishM commented Apr 21, 2026

Fixes debezium/dbz#1810

Description

This PR introduces automated end-to-end testing for the auditlog example and extends the existing test runner to support more general use cases.

This is the first step toward standardizing automated testing across debezium-examples, based on the approach used in postgres-failover-slots.

Fixes #1810

Changes

1. Auditlog E2E Test

  • Added auditlog/test.yaml defining the full test workflow:
    • Start services via Docker Compose
    • Register connector
    • Trigger API request with authentication
    • Verify enriched Kafka output

2. Test Runner Improvements (scripts/run-example-test.py)

  • Refactored HTTP handling into reusable helpers
  • Added support for:
    • http_post steps
    • Custom headers (for JWT-based flows)
    • expected_status in http_wait

These changes make the runner reusable across more complex examples.

3. CI Integration

  • updated workflow to execute the auditlog test using the Python runner

Verification

  • Tested locally (WSL environment)
  • Confirmed that:
    • Services start correctly
    • Connector is registered successfully
    • Enriched events are produced as expected

Checklist

  • If the changes include a new example, I added it to the list of examples in the README.md file

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an automated end-to-end test for the auditlog example and extends the shared YAML-driven test runner so it can exercise more realistic HTTP flows (POST + headers + status assertions), then wires the new test into CI.

Changes:

  • Introduces auditlog/test.yaml defining an end-to-end workflow (compose up, connector registration, authenticated API call, Kafka assertion).
  • Refactors scripts/run-example-test.py HTTP logic into a generic request helper and adds http_post, custom headers, and expected_status support for http_wait.
  • Updates the auditlog GitHub Actions workflow to build the example and execute the YAML test via the Python runner.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
scripts/run-example-test.py Generalizes HTTP step handling and adds http_post + header/status support to enable JWT-based flows in example tests.
auditlog/test.yaml Defines the auditlog E2E scenario executed by the shared runner.
.github/workflows/auditlog-workflow.yml Runs the auditlog E2E test in CI (Java build + Python runner execution).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/run-example-test.py Outdated
is_success = 200 <= resp.status_code < 300

if is_success:
if json_path and expected_value:
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

step_http_wait only validates expected_json_path/expected_value when both are truthy (if json_path and expected_value). This breaks waits for legitimate falsy values (e.g., 0, false, or empty string), causing the step to return early on HTTP success without checking the JSON field. Consider checking expected_value is not None (and json_path is not None) instead of truthiness so all values can be asserted reliably.

Suggested change
if json_path and expected_value:
if json_path is not None and expected_value is not None:

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@vsantonastaso vsantonastaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @KMohnishM,
first of all, thank you for your contribution!
I just have a couple of notes:
• It would be great to also have automated tests covering the “Administrate missing events” section mentioned in the README. That scenario is a bit more specific than the basic one currently tested, so having dedicated automation for it would be really valuable. What do you think?
• I noticed that the postgres-failover-slots test is failing. Have you had a chance to check whether the changes made in run-example-test.py might have affected it?

@KMohnishM KMohnishM force-pushed the dbz-1810 branch 2 times, most recently from 24c7123 to 51ec9e7 Compare April 25, 2026 14:38
@KMohnishM
Copy link
Copy Markdown
Contributor Author

KMohnishM commented Apr 25, 2026

Hi @vsantonastaso! Thanks for reviewing the PR. I have just pushed a new commit addressing both of your points:

  1. "Administrate missing events" automation: I completely agree, this is the most critical workflow of the example. I have fully instrumented this E2E scenario in the test.yaml! It now automatically simulates anomalous records, waits for the correlations, dynamically captures the spawned task IDs using a new {FIRST_KEY} JSON parser I added to the runner, and submits the missing auditData metadata natively.
    As a bonus, while writing this test, I noticed the admin-service was fragile to the UnrecognizedPropertyException when parsing newer Debezium message metadata. I went ahead and added @JsonIgnoreProperties(ignoreUnknown = true) across the models (like AuditData, VegetableEvent, etc.) to ensure long-term resilience for the demo.

  2. postgres-failover-slots Test Failures: You were right to call this out! It turns out postgres-failover-slots occasionally flakes locally due to latency when pgbouncer establishes failover connections. However, out of an abundance of caution, I re-reviewed my runner logic and completely restored the native bash -c implementation for the .yaml command execution strings. This guarantees 100% backward shell compatibility (for features like >, &&, |) with existing example tests like postgres-failover-slots. Additionally, I improved the script to explicitly allow modular docker_compose_build commands without blowing out image caching targets.

Let me know your review on the latest update.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/run-example-test.py Outdated
import os
import subprocess
import re
import shlex
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shlex is imported but never used, which adds dead code and can confuse future maintenance. Please remove the unused import (or use it if it was intended for command parsing).

Suggested change
import shlex

Copilot uses AI. Check for mistakes.
Comment thread scripts/run-example-test.py Outdated
# find and replace placeholders like ${my_var}
def replacer(match):
var_name = match.group(1)
return str(VARIABLES.get(var_name, match.group(0)))
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

substitute_vars() will replace placeholders with the string "None" when a variable exists in VARIABLES but its captured value is None (e.g., when extract_json_value() can’t find the path). This can silently produce invalid URLs/headers. Consider not setting VARIABLES[var_name] when the extracted value is None, and/or leaving the placeholder unchanged when the variable value is None.

Suggested change
return str(VARIABLES.get(var_name, match.group(0)))
if var_name not in VARIABLES or VARIABLES[var_name] is None:
return match.group(0)
return str(VARIABLES[var_name])

Copilot uses AI. Check for mistakes.
Comment on lines +136 to +139

# Use bash -c to ensure shell operators like >, |, && work perfectly
cmd = compose_cmd(config, "exec", "-T", service, "bash", "-c", command)

Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker_compose_exec now hard-requires bash inside the target container (... exec ... bash -c ...). Some images (notably Alpine-based) don’t ship with bash, which would break tests even when the command itself is valid. Consider using sh -c for portability or allowing the shell to be configurable per-step (defaulting to sh).

Suggested change
# Use bash -c to ensure shell operators like >, |, && work perfectly
cmd = compose_cmd(config, "exec", "-T", service, "bash", "-c", command)
shell = substitute_vars(step.get("shell", "sh"))
# Use a configurable shell so shell operators like >, |, && still work,
# while defaulting to portable `sh` for images that do not include bash.
cmd = compose_cmd(config, "exec", "-T", service, shell, "-c", command)

Copilot uses AI. Check for mistakes.
Comment thread scripts/run-example-test.py Outdated
Comment on lines +359 to +372
print(f"\n[FAIL] Step '{name}' failed: {e}")
service = step.get("service")
if service:
print(f"\n[LOGS] Last 100 lines for service '{service}':")
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The except block assumes name and step are always defined, but an exception can occur before the loop assigns them (e.g., steps: null in YAML causing enumerate(steps, 1) to throw). That would raise an UnboundLocalError and hide the original failure. Initialize name/step to safe defaults before the try, and guard access in the except block.

Copilot uses AI. Check for mistakes.
Comment thread auditlog/test.yaml Outdated
Comment on lines +11 to +19
- name: Build admin-service image
type: docker_compose_build
services:
- admin-service

- name: Start all services
type: docker_compose_up
detach: true

Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only admin-service is explicitly built before docker_compose_up, but auditlog/docker-compose.yaml defines build contexts for vegetables-service and log-enricher too. On a developer machine with existing images, docker compose up may reuse stale images and make the E2E test non-deterministic. Consider building all services with build contexts (or using a docker_compose_up option that forces --build).

Copilot uses AI. Check for mistakes.
Comment thread auditlog/test.yaml

- name: Verify raw anomalous event appears in Kafka
type: kafka_consume
service: kafka
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kafka_consume step sets service: kafka, but the runner currently hard-codes the Kafka service name to kafka and ignores any service field. Either remove this field from the YAML (to avoid suggesting it has an effect) or extend step_kafka_consume to honor an optional service override.

Suggested change
service: kafka

Copilot uses AI. Check for mistakes.
@KMohnishM KMohnishM force-pushed the dbz-1810 branch 6 times, most recently from 0642e47 to c49240c Compare April 28, 2026 17:28
Signed-off-by: Mohnish <kmohnishm@gmail.com>
@KMohnishM
Copy link
Copy Markdown
Contributor Author

Thanks for the feedback @vsantonastaso ! I have updated PR addressing all of Copilots suggestions .

@vsantonastaso
Copy link
Copy Markdown
Contributor

vsantonastaso commented Apr 30, 2026

Thanks for your effort. LGTM 👍

@KMohnishM
Copy link
Copy Markdown
Contributor Author

Thanks for the review and support @vsantonastaso.

@Naros, could you also take a look at this PR when you get a chance?

Also, from a broader perspective - with this PR we now have a reusable test runner and CI wired for the auditlog example. For extending this across the repository, do you have a preference between:

  • adding CI workflows per example incrementally, or
  • moving toward a centralized workflow that can execute different example test manifests?

I see this PR as establishing the baseline, but scaling it across all examples would likely need a consistent CI strategy, so wanted to align with your ideas on it .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement automated testing framework for debezium-examples

3 participants