Skip to content

test: group node pool verifiers into a single one#5178

Open
bennerv wants to merge 2 commits intomainfrom
verify-node-viability
Open

test: group node pool verifiers into a single one#5178
bennerv wants to merge 2 commits intomainfrom
verify-node-viability

Conversation

@bennerv
Copy link
Copy Markdown
Member

@bennerv bennerv commented May 7, 2026

What

Groups the various node pool verifiers we have into a single one

Why

We have duplicated node pool verification. Additionally, the node pool verification should be consistent across any node pool created.

Now this verifies:

  • node is ready
  • node is scheduable
  • can fetch logs on a pod running on the node - router
  • node replica count matches nodes in cluster

Testing

green e2e

Special notes for your reviewer

This is fallout from the konnectivity pods failing during our swift rollout. We failed for the kas to dial to kubelet due to konnectivity-agent having the wrong TLS cert bundle as we were setting domains when we shouldn't have been.

This will test we're always able to get node logs when a node is running.

bennerv and others added 2 commits May 5, 2026 16:46
Introduces a reusable VerifyNodeViability verifier that polls for node
readiness and fetches router-default pod logs to prove cluster viability.
Updates the private keyvault e2e test to use the new verifier.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…overy

Replace the monolithic VerifyNodeViability with individual composable
verifiers (VerifyNodesReady, VerifyNodesSchedulable, VerifyNodePoolNodeCount)
grouped into NodePoolVerifiers, passed to VerifyHCPCluster for parallel
execution. Each verifier has its own polling loop with delta-only logging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 7, 2026 17:14
@openshift-ci openshift-ci Bot requested review from deads2k and venkateshsredhat May 7, 2026 17:14
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 7, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bennerv

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates node pool validation in the E2E suite by introducing a standard “node pool viability” verifier set (readiness, schedulability, per-pool node count vs. ARM, and pod log accessibility) and updating multiple E2E tests to use it via VerifyHCPCluster.

Changes:

  • Added new node-related verifiers (VerifyNodesSchedulable, VerifyNodePoolNodeCount) and a NodePoolVerifiers(...) helper that bundles the standard node pool checks.
  • Removed the specialized VerifyNodePoolReadyAndSchedulableNodeCount verifier and migrated E2E tests to the unified verifier set.
  • Updated several E2E scenarios (OIDC workload identity, nodepool updates/upgrades, Cilium cluster create, private KV create) to rely on the unified node pool verification.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/util/verifiers/nodes.go Adds schedulability + ARM-backed per-nodepool count verifiers and a NodePoolVerifiers bundle; removes a prior nodepool-specific count verifier.
test/e2e/oidc_issuer_workload_identity.go Uses the unified node pool verifier bundle after creating the node pool.
test/e2e/nodepool_version_upgrade.go Replaces nodepool-specific “ready+not draining” count check with the unified node pool verifier bundle.
test/e2e/nodepool_update_nodes.go Replaces explicit node count + readiness checks with the unified node pool verifier bundle at multiple points, including autoscaling.
test/e2e/nodepool_labels_taints.go Replaces an Eventually wait-for-ready with a single VerifyHCPCluster(...NodePoolVerifiers...) call.
test/e2e/nodepool_ephemeral_osdisk.go Replaces explicit node count + readiness checks with the unified node pool verifier bundle.
test/e2e/cluster_create_private_kv.go Replaces a bespoke “pod logs fetchable” eventual loop with the unified node pool verifier bundle.
test/e2e/cluster_create_complex_cilium_kv.go Uses unified node pool verifier bundle alongside the Cilium operational verifier.
test/e2e/cluster_create_cni_cilium.go Uses unified node pool verifier bundle alongside the Cilium operational verifier.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +484 to +492
// NodePoolVerifiers returns the standard set of node pool verifiers:
// node readiness, schedulability, per-pool count matching, and deployment
// log accessibility.
func NodePoolVerifiers(nodePoolsClient *hcpsdk.NodePoolsClient, resourceGroup, clusterName string) []HostedClusterVerifier {
return []HostedClusterVerifier{
VerifyNodesReady(),
VerifyNodesSchedulable(),
VerifyNodePoolNodeCount(nodePoolsClient, resourceGroup, clusterName),
VerifyGetDeploymentLogs("openshift-ingress", "router-default", "router"),
Comment on lines +407 to +418
for _, np := range page.Value {
if np.Name == nil || np.Properties == nil {
continue
}
if np.Properties.ProvisioningState == nil || *np.Properties.ProvisioningState != hcpsdk.ProvisioningStateSucceeded {
continue
}
if np.Properties.Replicas == nil {
continue
}
expectedPools[*np.Name] = int(*np.Properties.Replicas)
}
Comment on lines +345 to +359
var unschedulable []string
for i := range nodes.Items {
if nodes.Items[i].Spec.Unschedulable {
unschedulable = append(unschedulable, nodes.Items[i].Name)
}
}
if len(unschedulable) > 0 {
lastErr = fmt.Errorf("%d of %d nodes unschedulable (cordoned): %v", len(unschedulable), len(nodes.Items), unschedulable)
currentError := lastErr.Error()
if currentError != previousError {
logger.Info("Verifier check", "name", v.Name(), "status", "failed", "error", currentError)
previousError = currentError
}
return false, nil
}
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 7, 2026

@bennerv: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/lint 376ba5c link true /test lint
ci/prow/e2e-parallel 376ba5c link true /test e2e-parallel

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants