Skip to content

Commit d6b001e

Browse files
authored
Improve CI reliability and developer productivity through test scheduling optimizations, mirror stability fixes, and a new artifact reuse feature. (#1379)
* Document disk-intensive test placement in greenplum_schedule Add comment explaining why autovacuum-template0-segment and profile tests are positioned early in the test schedule. These tests consume significant disk space through WAL generation, XID consumption, and autovacuum operations. Running them early when ~20GB disk space is available (vs ~10GB later) helps avoid disk exhaustion issues during test execution. * Fix Rocky Linux mirror instability in CI Add repository metadata refresh and retry logic to handle transient mirror failures during RPM installation. This addresses frequent 404 errors from Rocky Linux mirrors that cause CI failures. Changes: - Run 'dnf clean all' and 'dnf makecache --refresh' before installation - Add '--setopt=retries=10' to dnf install command - Apply fix to both rpm-install-test and test jobs This improves CI reliability without changing functionality. * Add artifact reuse feature for faster test iteration Enable reusing build artifacts from previous workflow runs to speed up test iteration by ~50-70 minutes. This is useful for debugging test failures without rebuilding. Changes: - Add 'reuse_artifacts_from_run_id' workflow input parameter - Skip build job when reusing artifacts from specified run - Skip rpm-install-test job when reusing artifacts - Update artifact download steps to support cross-run downloads - Add proper job conditionals to handle skipped build job Usage: Manually trigger workflow and specify a previous run ID in the 'reuse_artifacts_from_run_id' input field. Leave empty to build fresh. This maintains backward compatibility - default behavior unchanged. * Add GitHub Actions workflow documentation for developers Create comprehensive documentation for GitHub Actions workflows, focusing on features that help developers iterate faster when debugging CI issues. Key sections: - Manual workflow triggers and input parameters - Artifact reuse feature with step-by-step guide - Running workflows in forked repositories - Troubleshooting common issues This documentation enables developers to: - Reuse build artifacts to save ~50-70 minutes per test iteration - Run CI validation in their forks before submitting PRs - Understand available workflow options and test selections - Debug test failures more efficiently * Pin Rocky Linux repos to stable 9.x release Use --releasever=9 to pin dnf to stable Rocky Linux 9.x repos instead of bleeding-edge point releases (e.g., 9.6) that may not be fully synced across all mirrors. Rocky Linux maintains binary compatibility within major versions, so pinning to 9 ensures we get stable, widely-mirrored packages while remaining compatible with the 9.6 container OS. This complements the earlier retry/refresh logic by addressing the root cause: new point releases have metadata sync lag across mirror network. * Move all autovacuum tests to early execution Move autovacuum and autovacuum-segment tests alongside autovacuum-template0-segment to run early in the schedule when more disk space is available. All three autovacuum tests are disk-intensive and benefit from running when ~20GB is available rather than later when space may be constrained. This grouping also improves test organization by keeping related tests together. * Clarify secrets configuration in workflow documentation Update README to clarify that no manual secret configuration is required for normal development workflows: - GITHUB_TOKEN is automatically provided by GitHub - Only used for artifact reuse feature (downloading previous run artifacts) - DockerHub secrets only needed for custom container image builds (advanced/maintainer use case) This removes confusion about required setup steps for fork users.
1 parent 9d86458 commit d6b001e

3 files changed

Lines changed: 303 additions & 10 deletions

File tree

.github/workflows/README.md

Lines changed: 258 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
<!--
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# GitHub Actions Workflows
21+
22+
This directory contains GitHub Actions workflows for Apache Cloudberry CI/CD.
23+
24+
## Table of Contents
25+
26+
- [Available Workflows](#available-workflows)
27+
- [Manual Workflow Triggers](#manual-workflow-triggers)
28+
- [Artifact Reuse for Faster Testing](#artifact-reuse-for-faster-testing)
29+
- [Running Workflows in Forked Repositories](#running-workflows-in-forked-repositories)
30+
31+
## Available Workflows
32+
33+
| Workflow | Purpose | Trigger |
34+
|----------|---------|---------|
35+
| `build-cloudberry.yml` | Main CI: build, test, create RPMs | Push, PR, Manual |
36+
| `build-dbg-cloudberry.yml` | Debug build with assertions enabled | Push, PR, Manual |
37+
| `apache-rat-audit.yml` | License header compliance check | Push, PR |
38+
| `coverity.yml` | Static code analysis with Coverity | Weekly, Manual |
39+
| `sonarqube.yml` | Code quality analysis with SonarQube | Push to main |
40+
| `docker-cbdb-build-containers.yml` | Build Docker images for CI | Manual |
41+
| `docker-cbdb-test-containers.yml` | Build test Docker images | Manual |
42+
43+
## Manual Workflow Triggers
44+
45+
Many workflows support manual triggering via `workflow_dispatch`, allowing developers to run CI jobs on-demand.
46+
47+
### How to Manually Trigger a Workflow
48+
49+
1. Navigate to the **Actions** tab in GitHub
50+
2. Select the workflow from the left sidebar (e.g., "Build and Test Cloudberry")
51+
3. Click **Run workflow** button (top right)
52+
4. Select your branch
53+
5. Configure input parameters (if available)
54+
6. Click **Run workflow**
55+
56+
### Workflow Input Parameters
57+
58+
#### `build-cloudberry.yml` - Main CI
59+
60+
| Parameter | Description | Default | Example |
61+
|-----------|-------------|---------|---------|
62+
| `test_selection` | Comma-separated list of tests to run, or "all" | `all` | `ic-good-opt-off,ic-contrib` |
63+
| `reuse_artifacts_from_run_id` | Run ID to reuse build artifacts from (see below) | _(empty)_ | `12345678901` |
64+
65+
**Available test selections:**
66+
- `all` - Run all test suites
67+
- `ic-good-opt-off` - Installcheck with optimizer off
68+
- `ic-good-opt-on` - Installcheck with optimizer on
69+
- `ic-contrib` - Contrib extension tests
70+
- `ic-resgroup` - Resource group tests
71+
- `ic-resgroup-v2` - Resource group v2 tests
72+
- `ic-resgroup-v2-memory-accounting` - Resource group memory tests
73+
- `ic-singlenode` - Single-node mode tests
74+
- `make-installcheck-world` - Full test suite
75+
- And more... (see workflow for complete list)
76+
77+
## Artifact Reuse for Faster Testing
78+
79+
When debugging test failures, rebuilding Cloudberry (~50-70 minutes) on every iteration is inefficient. The artifact reuse feature allows you to reuse build artifacts from a previous successful run.
80+
81+
### How It Works
82+
83+
1. Build artifacts (RPMs, source tarballs) from a previous workflow run are downloaded
84+
2. Build job is skipped (saves ~45-60 minutes)
85+
3. RPM installation test is skipped (saves ~5-10 minutes)
86+
4. Test jobs run with the reused artifacts
87+
5. You can iterate on test configurations without rebuilding
88+
89+
### Step-by-Step Guide
90+
91+
#### 1. Find the Run ID
92+
93+
After a successful build (even if tests failed), get the run ID:
94+
95+
**Option A: From GitHub Actions UI**
96+
- Go to **Actions** tab → Click on a completed workflow run
97+
- The URL will be: `https://github.com/apache/cloudberry/actions/runs/12345678901`
98+
- The run ID is `12345678901`
99+
100+
**Option B: From GitHub API**
101+
```bash
102+
# List recent workflow runs
103+
gh run list --workflow=build-cloudberry.yml --limit 5
104+
105+
# Get run ID from specific branch
106+
gh run list --workflow=build-cloudberry.yml --branch=my-feature --limit 1
107+
```
108+
109+
#### 2. Trigger New Run with Artifact Reuse
110+
111+
**Via GitHub UI:**
112+
1. Go to **Actions****Build and Test Cloudberry**
113+
2. Click **Run workflow**
114+
3. Enter the run ID in **"Reuse build artifacts from a previous run ID"**
115+
4. Optionally customize **test_selection**
116+
5. Click **Run workflow**
117+
118+
**Via GitHub CLI:**
119+
```bash
120+
# Reuse artifacts from run 12345678901, run only specific tests
121+
gh workflow run build-cloudberry.yml \
122+
--field reuse_artifacts_from_run_id=12345678901 \
123+
--field test_selection=ic-good-opt-off
124+
```
125+
126+
#### 3. Monitor Test Execution
127+
128+
- Build job will be skipped (shows as "Skipped" in Actions UI)
129+
- RPM Install Test will be skipped
130+
- Test jobs will run with artifacts from the specified run ID
131+
- Total time: ~15-30 minutes (vs ~65-100 minutes for full build+test)
132+
133+
### Use Cases
134+
135+
**Debugging a specific test failure:**
136+
```bash
137+
# Run 1: Full build + all tests (finds test failure in ic-good-opt-off)
138+
gh workflow run build-cloudberry.yml
139+
140+
# Get the run ID from output
141+
RUN_ID=$(gh run list --workflow=build-cloudberry.yml --limit 1 --json databaseId --jq '.[0].databaseId')
142+
143+
# Run 2: Reuse artifacts, run only failing test
144+
gh workflow run build-cloudberry.yml \
145+
--field reuse_artifacts_from_run_id=$RUN_ID \
146+
--field test_selection=ic-good-opt-off
147+
```
148+
149+
**Testing different configurations:**
150+
```bash
151+
# Test with optimizer off, then on, using same build
152+
gh workflow run build-cloudberry.yml \
153+
--field reuse_artifacts_from_run_id=$RUN_ID \
154+
--field test_selection=ic-good-opt-off
155+
156+
gh workflow run build-cloudberry.yml \
157+
--field reuse_artifacts_from_run_id=$RUN_ID \
158+
--field test_selection=ic-good-opt-on
159+
```
160+
161+
### Limitations
162+
163+
- Artifacts expire after 90 days (GitHub default retention)
164+
- Run ID must be from the same repository (or accessible fork)
165+
- Artifacts must include both RPM and source build artifacts
166+
- Cannot reuse artifacts across different OS/architecture combinations
167+
- Changes to source code require a fresh build
168+
169+
## Running Workflows in Forked Repositories
170+
171+
GitHub Actions workflows are enabled in forks, allowing you to validate changes before submitting a Pull Request.
172+
173+
### Initial Setup (One-Time)
174+
175+
1. **Fork the repository** to your GitHub account
176+
177+
2. **Enable GitHub Actions** in your fork:
178+
- Go to your fork's **Actions** tab
179+
- Click **"I understand my workflows, go ahead and enable them"**
180+
181+
**Secrets Configuration:**
182+
183+
No manual secret configuration is required for the main build and test workflows.
184+
185+
- `GITHUB_TOKEN` is automatically provided by GitHub and used when downloading artifacts from previous runs (artifact reuse feature)
186+
- DockerHub secrets (`DOCKERHUB_USER`, `DOCKERHUB_TOKEN`) are only required for building custom container images (advanced/maintainer use case, not needed for typical development)
187+
188+
### Workflow Behavior in Forks
189+
190+
-**Automated triggers work**: Push and PR events trigger workflows
191+
-**Manual triggers work**: `workflow_dispatch` is fully functional
192+
-**Artifact reuse works**: Can reuse artifacts from previous runs in your fork
193+
- ⚠️ **Cross-fork artifact reuse**: Not supported (security restriction)
194+
- ⚠️ **Some features may be limited**: Certain features requiring organization-level secrets may not work
195+
196+
### Best Practices for Fork Development
197+
198+
1. **Test locally first** when possible (faster iteration)
199+
2. **Use manual triggers** to avoid burning GitHub Actions minutes unnecessarily
200+
3. **Use artifact reuse** to iterate on test failures efficiently
201+
4. **Push to feature branches** to trigger automated CI
202+
5. **Review Actions tab** to ensure workflows completed successfully before opening PR
203+
204+
### Example Fork Workflow
205+
206+
```bash
207+
# 1. Create feature branch in fork
208+
git checkout -b fix-test-failure
209+
210+
# 2. Make changes and push to fork
211+
git commit -am "Fix test failure"
212+
git push origin fix-test-failure
213+
214+
# 3. CI runs automatically on push
215+
216+
# 4. If tests fail, iterate using artifact reuse
217+
# Get run ID from your fork's Actions tab
218+
gh workflow run build-cloudberry.yml \
219+
--field reuse_artifacts_from_run_id=12345678901 \
220+
--field test_selection=ic-good-opt-off
221+
222+
# 5. Once tests pass, open PR to upstream
223+
gh pr create --web
224+
```
225+
226+
## Troubleshooting
227+
228+
### "Build job was skipped but tests failed to start"
229+
230+
**Cause:** Artifacts from specified run ID not found or expired
231+
232+
**Solution:**
233+
- Verify the run ID is correct
234+
- Check that run completed successfully (built artifacts)
235+
- Run a fresh build if artifacts expired (>90 days)
236+
237+
### "Workflow not found in fork"
238+
239+
**Cause:** GitHub Actions not enabled in fork
240+
241+
**Solution:**
242+
- Go to fork's **Actions** tab
243+
- Click to enable workflows
244+
245+
### "Resource not accessible by integration"
246+
247+
**Cause:** Workflow trying to access artifacts from different repository
248+
249+
**Solution:**
250+
- Can only reuse artifacts from same repository
251+
- Run a fresh build in your fork first, then reuse those artifacts
252+
253+
## Additional Resources
254+
255+
- [GitHub Actions Documentation](https://docs.github.com/en/actions)
256+
- [Cloudberry Contributing Guide](../../CONTRIBUTING.md)
257+
- [Cloudberry Build Guide](../../deploy/build/README.md)
258+
- [DevOps Scripts](../../devops/README.md)

.github/workflows/build-cloudberry.yml

Lines changed: 35 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,11 @@ on:
113113
required: false
114114
default: 'all'
115115
type: string
116+
reuse_artifacts_from_run_id:
117+
description: 'Reuse build artifacts from a previous run ID (leave empty to build fresh)'
118+
required: false
119+
default: ''
120+
type: string
116121

117122
concurrency:
118123
group: ${{ github.workflow }}-${{ github.ref }}
@@ -412,6 +417,7 @@ jobs:
412417
needs: [check-skip]
413418
runs-on: ubuntu-22.04
414419
timeout-minutes: 120
420+
if: github.event.inputs.reuse_artifacts_from_run_id == ''
415421
outputs:
416422
build_timestamp: ${{ steps.set_timestamp.outputs.timestamp }}
417423

@@ -687,6 +693,10 @@ jobs:
687693
rpm-install-test:
688694
name: RPM Install Test Apache Cloudberry
689695
needs: [check-skip, build]
696+
if: |
697+
!cancelled() &&
698+
(needs.build.result == 'success' || needs.build.result == 'skipped') &&
699+
github.event.inputs.reuse_artifacts_from_run_id == ''
690700
runs-on: ubuntu-22.04
691701
timeout-minutes: 120
692702

@@ -710,6 +720,8 @@ jobs:
710720
name: apache-cloudberry-db-incubating-rpm-build-artifacts
711721
path: ${{ github.workspace }}/rpm_build_artifacts
712722
merge-multiple: false
723+
run-id: ${{ github.event.inputs.reuse_artifacts_from_run_id || github.run_id }}
724+
github-token: ${{ secrets.GITHUB_TOKEN }}
713725

714726
- name: Cloudberry Environment Initialization
715727
if: needs.check-skip.outputs.should_skip != 'true'
@@ -814,12 +826,18 @@ jobs:
814826
echo "Version: ${RPM_VERSION}"
815827
echo "Release: ${RPM_RELEASE}"
816828
829+
# Refresh repository metadata to avoid mirror issues
830+
echo "Refreshing repository metadata..."
831+
dnf clean all
832+
dnf makecache --refresh || dnf makecache
833+
817834
# Clean install location
818835
rm -rf /usr/local/cloudberry-db
819836
820-
# Install RPM
837+
# Install RPM with retry logic for mirror issues
838+
# Use --releasever=9 to pin to stable Rocky Linux 9 repos (not bleeding-edge 9.6)
821839
echo "Starting installation..."
822-
if ! time dnf install -y "${RPM_FILE}"; then
840+
if ! time dnf install -y --setopt=retries=10 --releasever=9 "${RPM_FILE}"; then
823841
echo "::error::RPM installation failed"
824842
exit 1
825843
fi
@@ -858,6 +876,9 @@ jobs:
858876
test:
859877
name: ${{ matrix.test }}
860878
needs: [check-skip, build, prepare-test-matrix]
879+
if: |
880+
!cancelled() &&
881+
(needs.build.result == 'success' || needs.build.result == 'skipped')
861882
runs-on: ubuntu-22.04
862883
timeout-minutes: 120
863884
# actionlint-allow matrix[*].pg_settings
@@ -1087,6 +1108,8 @@ jobs:
10871108
name: apache-cloudberry-db-incubating-rpm-build-artifacts
10881109
path: ${{ github.workspace }}/rpm_build_artifacts
10891110
merge-multiple: false
1111+
run-id: ${{ github.event.inputs.reuse_artifacts_from_run_id || github.run_id }}
1112+
github-token: ${{ secrets.GITHUB_TOKEN }}
10901113

10911114
- name: Download Cloudberry Source build artifacts
10921115
if: needs.check-skip.outputs.should_skip != 'true'
@@ -1095,6 +1118,8 @@ jobs:
10951118
name: apache-cloudberry-db-incubating-source-build-artifacts
10961119
path: ${{ github.workspace }}/source_build_artifacts
10971120
merge-multiple: false
1121+
run-id: ${{ github.event.inputs.reuse_artifacts_from_run_id || github.run_id }}
1122+
github-token: ${{ secrets.GITHUB_TOKEN }}
10981123

10991124
- name: Verify downloaded artifacts
11001125
if: needs.check-skip.outputs.should_skip != 'true'
@@ -1186,12 +1211,18 @@ jobs:
11861211
echo "Version: ${RPM_VERSION}"
11871212
echo "Release: ${RPM_RELEASE}"
11881213
1214+
# Refresh repository metadata to avoid mirror issues
1215+
echo "Refreshing repository metadata..."
1216+
dnf clean all
1217+
dnf makecache --refresh || dnf makecache
1218+
11891219
# Clean install location
11901220
rm -rf /usr/local/cloudberry-db
11911221
1192-
# Install RPM
1222+
# Install RPM with retry logic for mirror issues
1223+
# Use --releasever=9 to pin to stable Rocky Linux 9 repos (not bleeding-edge 9.6)
11931224
echo "Starting installation..."
1194-
if ! time dnf install -y "${RPM_FILE}"; then
1225+
if ! time dnf install -y --setopt=retries=10 --releasever=9 "${RPM_FILE}"; then
11951226
echo "::error::RPM installation failed"
11961227
exit 1
11971228
fi

0 commit comments

Comments
 (0)