Skip to content

dev-infrastructure: rewrite PKO cleanup script in Go (AROSLSRE-789)#5149

Draft
raelga wants to merge 5 commits intomainfrom
raelga/cleanup-pko-go
Draft

dev-infrastructure: rewrite PKO cleanup script in Go (AROSLSRE-789)#5149
raelga wants to merge 5 commits intomainfrom
raelga/cleanup-pko-go

Conversation

@raelga
Copy link
Copy Markdown
Collaborator

@raelga raelga commented May 6, 2026

AROSLSRE-789

What

Rewrite cleanup-pko-resources.sh as a Go program with proper error handling and control flow.

Why

The shell version had issues with error propagation that caused EV2 rollout failures (#5145 reverted the original script). Shell's error handling model (set -o errexit + || true patterns) makes it hard to reason about control flow in complex cleanup logic.

Per Steve's suggestion: "consider writing this in Go — we've surpassed what we should be doing in Shell."

Changes

Shell Go
kubectl CLI calls k8s client-go dynamic client + apiextensions client
set -o errexit / `
mapfile + grep for CRD discovery List CRDs + filter by API group
jsonpath output parsing Typed Go structs (unstructured.Unstructured)
kubectl patch for finalizers MergePatch via dynamic client
kubectl delete with --timeout DeleteCollection with propagation policy

Behavior

Best-effort — logs errors but always exits 0 so it never blocks EV2 rollouts. Error count tracked and reported in summary line. Identical cleanup logic:

  1. Discover all CRDs in the package-operator.run API group
  2. Delete all CRs for each CRD
  3. Poll for cascading deletion (180s max)
  4. Strip finalizers on stuck resources
  5. Delete the CRDs themselves

Idempotent — safe on clusters that never had PKO.

Testing

  • Deploy to personal dev and run against a management cluster
  • Verify idempotent behavior on clean cluster (no PKO CRDs)

Supersedes #5147

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 6, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@raelga
Copy link
Copy Markdown
Collaborator Author

raelga commented May 6, 2026

/test all

@sclarkso
Copy link
Copy Markdown
Collaborator

sclarkso commented May 6, 2026

@raelga

The Namespace("").DeleteCollection(...) call for namespaced resources like Package and ObjectSet likely doesn't do what we want — cross-namespace List and Watch work at the cluster-scoped URL, but mutations like deleteCollection appear to require a specific namespace. The original shell script handled this with kubectl delete --all-namespaces --all, which lists first then deletes per namespace, but that didn't get translated to the Go equivalent.

Without this, the namespaced PKO CRs never get deleted. The finalizer stripping step still works correctly (it uses cross-namespace List then Patch per namespace), so future HCP deletions won't get blocked. The main impact is that orphaned PKO CRs linger in HCP namespaces until those namespaces are deleted, and the PKO CRDs can't be removed because CRs still reference them.

The fix lists across all namespaces first, collects the distinct namespaces containing CRs, then calls DeleteCollection per namespace — matching what kubectl does internally.

Also added an IsNotFound guard on CRD deletion to avoid spurious error logs if a CRD disappears between discovery and the delete call.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 6, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: raelga

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sclarkso
Copy link
Copy Markdown
Collaborator

sclarkso commented May 6, 2026

/test all

@sclarkso sclarkso force-pushed the raelga/cleanup-pko-go branch from 06cfe9f to 4457e19 Compare May 6, 2026 22:18
raelga and others added 5 commits May 7, 2026 08:22
Replace the shell script with a Go program for proper error handling
and control flow. The shell version had issues with error propagation
that caused EV2 rollout failures.
Log errors but always exit 0 so the cleanup never blocks EV2
rollouts. Error count is tracked and reported in the summary line.
@sclarkso sclarkso force-pushed the raelga/cleanup-pko-go branch from 4457e19 to 25b3441 Compare May 6, 2026 22:23
@sclarkso
Copy link
Copy Markdown
Collaborator

sclarkso commented May 6, 2026

/test all

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 6, 2026

@raelga: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify 25b3441 link true /test verify
ci/prow/e2e-parallel 25b3441 link true /test e2e-parallel
ci/prow/images 25b3441 link true /test images

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@raelga raelga changed the title dev-infrastructure: rewrite PKO cleanup script in Go dev-infrastructure: rewrite PKO cleanup script in Go (AROSLSRE-789) May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants