test: group node pool verifiers into a single one#5178
Conversation
Introduces a reusable VerifyNodeViability verifier that polls for node readiness and fetches router-default pod logs to prove cluster viability. Updates the private keyvault e2e test to use the new verifier. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…overy Replace the monolithic VerifyNodeViability with individual composable verifiers (VerifyNodesReady, VerifyNodesSchedulable, VerifyNodePoolNodeCount) grouped into NodePoolVerifiers, passed to VerifyHCPCluster for parallel execution. Each verifier has its own polling loop with delta-only logging. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bennerv The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
This PR consolidates node pool validation in the E2E suite by introducing a standard “node pool viability” verifier set (readiness, schedulability, per-pool node count vs. ARM, and pod log accessibility) and updating multiple E2E tests to use it via VerifyHCPCluster.
Changes:
- Added new node-related verifiers (
VerifyNodesSchedulable,VerifyNodePoolNodeCount) and aNodePoolVerifiers(...)helper that bundles the standard node pool checks. - Removed the specialized
VerifyNodePoolReadyAndSchedulableNodeCountverifier and migrated E2E tests to the unified verifier set. - Updated several E2E scenarios (OIDC workload identity, nodepool updates/upgrades, Cilium cluster create, private KV create) to rely on the unified node pool verification.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/util/verifiers/nodes.go | Adds schedulability + ARM-backed per-nodepool count verifiers and a NodePoolVerifiers bundle; removes a prior nodepool-specific count verifier. |
| test/e2e/oidc_issuer_workload_identity.go | Uses the unified node pool verifier bundle after creating the node pool. |
| test/e2e/nodepool_version_upgrade.go | Replaces nodepool-specific “ready+not draining” count check with the unified node pool verifier bundle. |
| test/e2e/nodepool_update_nodes.go | Replaces explicit node count + readiness checks with the unified node pool verifier bundle at multiple points, including autoscaling. |
| test/e2e/nodepool_labels_taints.go | Replaces an Eventually wait-for-ready with a single VerifyHCPCluster(...NodePoolVerifiers...) call. |
| test/e2e/nodepool_ephemeral_osdisk.go | Replaces explicit node count + readiness checks with the unified node pool verifier bundle. |
| test/e2e/cluster_create_private_kv.go | Replaces a bespoke “pod logs fetchable” eventual loop with the unified node pool verifier bundle. |
| test/e2e/cluster_create_complex_cilium_kv.go | Uses unified node pool verifier bundle alongside the Cilium operational verifier. |
| test/e2e/cluster_create_cni_cilium.go | Uses unified node pool verifier bundle alongside the Cilium operational verifier. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // NodePoolVerifiers returns the standard set of node pool verifiers: | ||
| // node readiness, schedulability, per-pool count matching, and deployment | ||
| // log accessibility. | ||
| func NodePoolVerifiers(nodePoolsClient *hcpsdk.NodePoolsClient, resourceGroup, clusterName string) []HostedClusterVerifier { | ||
| return []HostedClusterVerifier{ | ||
| VerifyNodesReady(), | ||
| VerifyNodesSchedulable(), | ||
| VerifyNodePoolNodeCount(nodePoolsClient, resourceGroup, clusterName), | ||
| VerifyGetDeploymentLogs("openshift-ingress", "router-default", "router"), |
| for _, np := range page.Value { | ||
| if np.Name == nil || np.Properties == nil { | ||
| continue | ||
| } | ||
| if np.Properties.ProvisioningState == nil || *np.Properties.ProvisioningState != hcpsdk.ProvisioningStateSucceeded { | ||
| continue | ||
| } | ||
| if np.Properties.Replicas == nil { | ||
| continue | ||
| } | ||
| expectedPools[*np.Name] = int(*np.Properties.Replicas) | ||
| } |
| var unschedulable []string | ||
| for i := range nodes.Items { | ||
| if nodes.Items[i].Spec.Unschedulable { | ||
| unschedulable = append(unschedulable, nodes.Items[i].Name) | ||
| } | ||
| } | ||
| if len(unschedulable) > 0 { | ||
| lastErr = fmt.Errorf("%d of %d nodes unschedulable (cordoned): %v", len(unschedulable), len(nodes.Items), unschedulable) | ||
| currentError := lastErr.Error() | ||
| if currentError != previousError { | ||
| logger.Info("Verifier check", "name", v.Name(), "status", "failed", "error", currentError) | ||
| previousError = currentError | ||
| } | ||
| return false, nil | ||
| } |
|
@bennerv: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What
Groups the various node pool verifiers we have into a single one
Why
We have duplicated node pool verification. Additionally, the node pool verification should be consistent across any node pool created.
Now this verifies:
Testing
green e2e
Special notes for your reviewer
This is fallout from the konnectivity pods failing during our swift rollout. We failed for the kas to dial to kubelet due to konnectivity-agent having the wrong TLS cert bundle as we were setting domains when we shouldn't have been.
This will test we're always able to get node logs when a node is running.