[+] implement parallel source discovery by pashagolub · Pull Request #1378 · cybertec-postgresql/pgwatch

pashagolub · 2026-04-24T12:26:51Z

Improves dead-source handling with parallel resolution and instance_up=0 on discovery failure.

Sources.ResolveDatabases() previously resolved each source sequentially. A single slow or unresponsive source (e.g. a continuous-discovery endpoint behind a firewall) would block discovery of all subsequent sources for the full connection timeout duration.

Sources are now resolved concurrently using sync.WaitGroup.Go(). Results are collected into a pre-allocated indexed slice to preserve deterministic ordering. Per-source error logging with source name is included in the resolver itself.

When a SourcePostgresContinuous or SourcePatroni source fails to resolve any databases, LoadSources() now emits instance_up=0 to the configured sinks. This makes the failure visible in dashboards and alerting, consistent with how unreachable directly-monitored sources are handled.

coveralls · 2026-04-24T12:31:10Z

Coverage Report for CI Build 25120163997

Coverage decreased (-0.3%) to 83.065%

Details

Coverage decreased (-0.3%) from the base build.
Patch coverage: 23 uncovered changes across 2 files (6 of 29 lines covered, 20.69%).
6 coverage regressions across 1 file.

Uncovered Changes

File	Changed	Covered	%
internal/sources/resolver.go	21	0	0.0%
internal/reaper/reaper.go	6	4	66.67%

Coverage Regressions

6 previously-covered lines in 1 file lost coverage.

File	Lines Losing Coverage	Coverage
internal/reaper/reaper.go	6	35.53%

Coverage Stats


Relevant Lines:	5338
Covered Lines:	4434
Line Coverage:	83.06%
Coverage Strength:	0.95 hits per line

💛 - Coveralls

0xgouda · 2026-04-27T02:32:41Z

Is this related to #1377?

pashagolub · 2026-04-27T08:48:32Z

Is this related to #1377?

In some way. I couldn't reproduce that issue but I found that misconfigured discovery sources could cause a huge delays.

0xgouda · 2026-04-28T02:34:34Z

+			if onError != nil {
+				onError(srcs[i].Name)
+			}


I have a concern here that can be reproduced with the following steps:

Define a target that happens to be unreachable and is of the kind postgres-continuous-discovery

pgwatch writes instance_up=0 for the target for a while with dbname = sourceName

The target becomes alive

pgwatch runs for a while and now writes the updated instance_up = 1, but with a new dbname = sourceName + _ + realDbname

The target is down again and its instance_up = 0 is written with dbname = sourceName + _ + realDbname

So the full instance uptime history becomes a bit disconnected, with different dbname[s].

But generally, I think that's the best we can do, just wanted to note this behaviour.

Good catch! We could use source instead of dbname. This way we will know for sure at which point that happened

Can you explain more?

// WriteInstanceDown writes instance_up = 0 metric to sinks for the given source func (r *Reaper) WriteInstanceDown(md *sources.SourceConn) { r.measurementCh <- metrics.MeasurementEnvelope{ DBName: md.Name, MetricName: specialMetricInstanceUp, Data: metrics.Measurements{metrics.Measurement{ metrics.EpochColumnName: time.Now().UnixNano(), "kind": string(md.Kind), // ^^^^^^^^^^^^^^^^^^^^^^^ specialMetricInstanceUp: 0}, }, } }

this way grafana could distinguish regular databases vs all other

I don't think adding the kind to the measurements will help.

What if we have 2+ postgres-continuous-discovery/patroni sources? Grouping by kind then will be useless.

And again, I guess what we have already is the best we can do, since we won't be able to add any instance identifiers (i.e., sys_id) to the measurements before connecting at least once.

But then, in Grafana, one might have 2 panels like:
instance uptime history grouped by sys_id
and
instance uptime history for instances with unknown sys_id

and they should complement each other, to give the full uptime history view.

Improves dead-source handling with parallel resolution and instance_up=0 on discovery failure. `Sources.ResolveDatabases()` previously resolved each source sequentially. A single slow or unresponsive source (e.g. a continuous-discovery endpoint behind a firewall) would block discovery of all subsequent sources for the full connection timeout duration. Sources are now resolved concurrently using `sync.WaitGroup.Go()`. Results are collected into a pre-allocated indexed slice to preserve deterministic ordering. Per-source error logging with source name is included in the resolver itself. When a `SourcePostgresContinuous` or `SourcePatroni` source fails to resolve any databases, `LoadSources()` now emits `instance_up=0` to the configured sinks. This makes the failure visible in dashboards and alerting, consistent with how unreachable directly-monitored sources are handled.

pashagolub self-assigned this Apr 24, 2026

pashagolub requested a review from 0xgouda April 24, 2026 12:27

pashagolub added the sources What sources and in what way to monitor label Apr 24, 2026

pashagolub force-pushed the parallel-source-discovery branch from a3814f5 to a715672 Compare April 24, 2026 18:41

0xgouda force-pushed the parallel-source-discovery branch from a715672 to 75ea9bd Compare April 28, 2026 00:40

0xgouda reviewed Apr 28, 2026

View reviewed changes

pashagolub added 3 commits April 29, 2026 17:39

reaper.WriteInstanceDown() accepts name as an argument

99cc08f

use on_error callback in Sources.ResolveDatabases()

a47b7fb

pashagolub force-pushed the parallel-source-discovery branch from 75ea9bd to a47b7fb Compare April 29, 2026 15:39

use const instead of hard-coded value

791941d

pashagolub merged commit 1677cdb into master Apr 29, 2026
9 checks passed

pashagolub deleted the parallel-source-discovery branch April 29, 2026 16:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[+] implement parallel source discovery#1378

[+] implement parallel source discovery#1378
pashagolub merged 4 commits intomasterfrom
parallel-source-discovery

pashagolub commented Apr 24, 2026 •

edited

Loading

Uh oh!

coveralls commented Apr 24, 2026 •

edited

Loading

Uh oh!

0xgouda commented Apr 27, 2026

Uh oh!

pashagolub commented Apr 27, 2026

Uh oh!

0xgouda Apr 28, 2026 •

edited

Loading

Uh oh!

pashagolub Apr 28, 2026

Uh oh!

0xgouda Apr 28, 2026

Uh oh!

pashagolub Apr 28, 2026 •

edited

Loading

Uh oh!

pashagolub Apr 28, 2026

Uh oh!

0xgouda Apr 29, 2026

Uh oh!

0xgouda Apr 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pashagolub commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report for CI Build 25120163997

Coverage decreased (-0.3%) to 83.065%

Details

Uncovered Changes

Coverage Regressions

Coverage Stats

💛 - Coveralls

Uh oh!

0xgouda commented Apr 27, 2026

Uh oh!

pashagolub commented Apr 27, 2026

Uh oh!

0xgouda Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pashagolub Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

0xgouda Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

pashagolub Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pashagolub Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

0xgouda Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

0xgouda Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pashagolub commented Apr 24, 2026 •

edited

Loading

coveralls commented Apr 24, 2026 •

edited

Loading

0xgouda Apr 28, 2026 •

edited

Loading

pashagolub Apr 28, 2026 •

edited

Loading

0xgouda Apr 29, 2026 •

edited

Loading