WIP: tooling: Update `verify-feature-promotion` logic for installer-focused features by everettraven · Pull Request #2903 · openshift/api · GitHub
Skip to content

WIP: tooling: Update verify-feature-promotion logic for installer-focused features#2903

Draft
everettraven wants to merge 6 commits into
openshift:masterfrom
everettraven:tooling/installer-feature-gate-promotion-updates
Draft

WIP: tooling: Update verify-feature-promotion logic for installer-focused features#2903
everettraven wants to merge 6 commits into
openshift:masterfrom
everettraven:tooling/installer-feature-gate-promotion-updates

Conversation

@everettraven

Copy link
Copy Markdown
Contributor

No description provided.

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 25, 2026
@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

The PR marks AWSDualStackInstall as enabled in the feature matrix, the feature-gate definition, and release manifests. It also changes featuregate-test-analyzer so Install feature gates use a job-based verification path that fetches installer jobs, recent job runs, and triage data from Sippy before computing results. Minor formatting and file-permission changes are included in the analyzer file.

Suggested reviewers

  • JoelSpeed

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Ote Binary Stdout Contract ❌ Error tools/codegen/cmd/featuregate-test-analyzer.go adds fmt.Printf/fmt.Println in a package main execution path (Run/verifyJobPassRate), writing to stdout from the binary. Replace those prints with o.Out/o.ErrOut or stderr/logger output, and keep process-level code silent on stdout except the intended machine-readable stream.
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive No description was provided by the author, making it impossible to assess relatedness to the changeset. Add a pull request description explaining the changes made and their purpose for better context.
✅ Passed checks (12 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: updating tooling for installer-focused feature gate promotion verification.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed No Go test files or Ginkgo titles were changed in this commit; the diff only touches feature gates, manifests, and tooling.
Test Structure And Quality ✅ Passed The PR only changes docs, feature-gate config, and codegen helpers; no Ginkgo test files or test blocks are modified.
Microshift Test Compatibility ✅ Passed PR only changes feature docs, manifests, and codegen; no modified file adds Ginkgo e2e tests or MicroShift-unsupported API usage.
Single Node Openshift (Sno) Test Compatibility ✅ Passed Touched files are feature docs/manifests and codegen tooling; no new Ginkgo e2e tests or SNO-sensitive specs were added.
Topology-Aware Scheduling Compatibility ✅ Passed No modified deployment/operator/controller code or pod specs; touched files only update feature-gate metadata and Sippy/test-analyzer tooling, with no nodeSelector/affinity/PDB changes.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests were added; the touched files are feature-gate/config and tooling, and the modified Go files contain no It/Describe/Context/When blocks.
No-Weak-Crypto ✅ Passed No MD5/SHA1/DES/RC4/3DES/Blowfish/ECB, custom crypto, or secret/token comparisons were introduced in the changed files.
Container-Privileges ✅ Passed The PR only changes feature-gate docs/codegen and feature-gate YAMLs; no privileged, hostPID/hostNetwork/hostIPC, SYS_ADMIN, or allowPrivilegeEscalation settings appear.
No-Sensitive-Data-In-Logs ✅ Passed New logs only print run counts, feature-gate/job variants, test-grid URL, and failed test names; no secrets/PII/token-like strings appear.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

Error: build linters: unable to load custom analyzer "kubeapilinter": tools/_output/bin/kube-api-linter.so, plugin: not implemented
The command is terminated due to an error: build linters: unable to load custom analyzer "kubeapilinter": tools/_output/bin/kube-api-linter.so, plugin: not implemented


Comment @coderabbitai help to get the list of available commands.

@everettraven

Copy link
Copy Markdown
Contributor Author

/test verify-feature-promotion

@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Hello @everettraven! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 25, 2026
@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign everettraven for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

@everettraven: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify-feature-promotion c07aa2f link true /test verify-feature-promotion

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/codegen/cmd/featuregate-test-analyzer.go`:
- Around line 1092-1104: The hardcoded stub in getJobsForFeatureGateFromSippy is
bypassing the real Sippy lookup and ignoring the release, featureGate, and
variant inputs. Remove the temporary out slice and early return so the function
proceeds to the actual query logic, and keep the existing
getJobsForFeatureGateFromSippy path using the provided parameters to fetch the
correct jobs.
- Around line 1046-1078: The triage handling in verifyJobPassRate is too coarse
and is using all job runs in the pass-rate calculation, which causes triaged
failures to suppress unrelated variants and still count against the denominator.
Update the flow around getTriagedTestFailuresFromSippy and verifyJobPassRate so
triage is keyed by the full variant identity returned by Sippy, not just test
name, and filter out triaged failures before setting
TestResults.TotalRuns/SuccessfulRuns/FailedRuns. Make sure the loop over jobRuns
only increments failures for non-triaged failing runs and that pass-rate math
reflects the filtered set consistently in both affected call sites.
- Around line 1051-1052: The triage lookup result from
getTriagedTestFailuresFromSippy in featuregate-test-analyzer.go is using the
returned err without handling it, which can make failures look untriaged. Update
the logic around triagedTestFailures, err to check and handle the error
immediately: propagate it or stop processing rather than continuing with an
empty triaged set. Keep the fix localized to the code path that consumes
getTriagedTestFailuresFromSippy so the triage counting logic only runs when the
lookup succeeds.
- Around line 999-1008: The defaultTransport setup in featuregate-test-analyzer
should not disable TLS verification for Sippy requests. Remove the
InsecureSkipVerify setting from the tls.Config used by defaultTransport, and
rely on normal certificate validation so the HTTP client still enforces HTTPS
trust checks while preserving the existing transport configuration.
- Around line 993-1016: `verifyInstallFeatureGatePromotion` and the Sippy helper
chain are not propagating cancellation, so requests can outlive `Run(ctx)`.
Update `verifyInstallFeatureGatePromotion` to pass `ctx` into
`getJobsForFeatureGateFromSippy`, then thread it through `getJobRunsFromSippy`
and `getTriagedTestFailuresFromSippy`, and use it when building the HTTP
requests instead of calling `client.Get(...)` directly. Keep the existing
symbols in place, only add context plumbing so request lifetimes follow the
caller’s deadline/cancelation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 382c96e4-2e33-4b88-b0fe-749c605219d5

📥 Commits

Reviewing files that changed from the base of the PR and between fcee122 and c07aa2f.

📒 Files selected for processing (7)
  • features.md
  • features/features.go
  • payload-manifests/featuregates/featureGate-4-10-Hypershift-Default.yaml
  • payload-manifests/featuregates/featureGate-4-10-Hypershift-OKD.yaml
  • payload-manifests/featuregates/featureGate-4-10-SelfManagedHA-Default.yaml
  • payload-manifests/featuregates/featureGate-4-10-SelfManagedHA-OKD.yaml
  • tools/codegen/cmd/featuregate-test-analyzer.go

Comment thread tools/codegen/cmd/featuregate-test-analyzer.go Outdated
Comment on lines +999 to +1008
defaultTransport := &http.Transport{
Proxy: http.ProxyFromEnvironment,
ForceAttemptHTTP2: true,
MaxIdleConns: 100,
IdleConnTimeout: 90 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
ExpectContinueTimeout: 1 * time.Second,
TLSClientConfig: &tls.Config{
InsecureSkipVerify: true,
},

@coderabbitai coderabbitai Bot Jun 25, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

file="tools/codegen/cmd/featuregate-test-analyzer.go"

echo "== line count =="
wc -l "$file"

echo "== around reported lines =="
sed -n '960,1035p' "$file"

echo "== later helper context =="
sed -n '1035,1125p' "$file"

Repository: openshift/api

Length of output: 5480


🏁 Script executed:

#!/bin/bash
set -euo pipefail

file="tools/codegen/cmd/featuregate-test-analyzer.go"

echo "== line count =="
wc -l "$file"

echo "== around reported lines =="
sed -n '960,1035p' "$file"

echo "== later helper context =="
sed -n '1035,1125p' "$file"

Repository: openshift/api

Length of output: 5480


🏁 Script executed:

#!/bin/bash
set -euo pipefail

file="tools/codegen/cmd/featuregate-test-analyzer.go"

echo "== line count =="
wc -l "$file"

echo "== around reported lines =="
sed -n '960,1035p' "$file"

echo "== later helper context =="
sed -n '1035,1125p' "$file"

Repository: openshift/api

Length of output: 5480


Don't disable TLS verification for Sippy requests.

InsecureSkipVerify: true allows a MITM to tamper with the job/triage data used here, which can change promotion decisions.

Suggested fix
 	defaultTransport := &http.Transport{
 		Proxy:                 http.ProxyFromEnvironment,
 		ForceAttemptHTTP2:     true,
 		MaxIdleConns:          100,
 		IdleConnTimeout:       90 * time.Second,
 		TLSHandshakeTimeout:   10 * time.Second,
 		ExpectContinueTimeout: 1 * time.Second,
-		TLSClientConfig: &tls.Config{
-			InsecureSkipVerify: true,
-		},
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
defaultTransport := &http.Transport{
Proxy: http.ProxyFromEnvironment,
ForceAttemptHTTP2: true,
MaxIdleConns: 100,
IdleConnTimeout: 90 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
ExpectContinueTimeout: 1 * time.Second,
TLSClientConfig: &tls.Config{
InsecureSkipVerify: true,
},
defaultTransport := &http.Transport{
Proxy: http.ProxyFromEnvironment,
ForceAttemptHTTP2: true,
MaxIdleConns: 100,
IdleConnTimeout: 90 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
ExpectContinueTimeout: 1 * time.Second,
}
🧰 Tools
🪛 ast-grep (0.44.0)

[warning] 1005-1007: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{
InsecureSkipVerify: true,
}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures

(missing-ssl-minversion-go)

🪛 OpenGrep (1.23.0)

[ERROR] 1006-1008: TLS certificate verification is disabled via InsecureSkipVerify. This allows man-in-the-middle attacks. Remove InsecureSkipVerify or set it to false.

(coderabbit.tls.go-insecure-skip-verify)


[ERROR] 1006-1008: TLS certificate verification is disabled via InsecureSkipVerify. This allows man-in-the-middle attacks. Remove InsecureSkipVerify or set it to false.

(coderabbit.tls.go-insecure-skip-verify)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/codegen/cmd/featuregate-test-analyzer.go` around lines 999 - 1008, The
defaultTransport setup in featuregate-test-analyzer should not disable TLS
verification for Sippy requests. Remove the InsecureSkipVerify setting from the
tls.Config used by defaultTransport, and rely on normal certificate validation
so the HTTP client still enforces HTTPS trust checks while preserving the
existing transport configuration.

Source: Linters/SAST tools

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already do this everywhere else and this hasn't been a problem.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@everettraven Fair point on consistency. I agree this isn’t unique to this PR, but I still think the concern is valid: InsecureSkipVerify: true disables certificate validation for data that feeds promotion decisions, so “it hasn’t been a problem yet” is more luck than guarantee.

That said, if you’d rather keep this PR scoped to the installer-logic change, I’m okay deferring this as broader cleanup instead of blocking here. If you want, I can help turn it into a follow-up issue to remove the insecure TLS config across the related Sippy clients.

Comment thread tools/codegen/cmd/featuregate-test-analyzer.go
Comment thread tools/codegen/cmd/featuregate-test-analyzer.go Outdated
Comment on lines +1092 to +1104
func getJobsForFeatureGateFromSippy(client *http.Client, release, featureGate string, variant JobVariant) ([]sippyJob, error) {
// TODO: Remove this once done testing the other logic.

out := []sippyJob{
{
Name: "periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-installer-dualstack-ipv4-primary-techpreview",
},
{
Name: "periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-installer-dualstack-ipv6-primary-techpreview",
},
}

return out, nil

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift

Remove the hardcoded job stub before merging.

This early return makes the real Sippy query below unreachable and ignores release, featureGate, and variant. Every install gate now gets evaluated against the same two AWS dualstack techpreview jobs, which will misreport coverage for other platforms and topologies.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/codegen/cmd/featuregate-test-analyzer.go` around lines 1092 - 1104, The
hardcoded stub in getJobsForFeatureGateFromSippy is bypassing the real Sippy
lookup and ignoring the release, featureGate, and variant inputs. Remove the
temporary out slice and early return so the function proceeds to the actual
query logic, and keep the existing getJobsForFeatureGateFromSippy path using the
provided parameters to fetch the correct jobs.

return false
}

func verifyInstallFeatureGatePromotion(featureGate string, jobVariant JobVariant) (*TestingResults, error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest going generic, there may be other components that have to prove their fg is ready via a job overall result, not tests within the job.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! So in a couple days we could make this a query for variant Capability:FGName (assuming we name this correctly to match the fg)

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tools/codegen/cmd/featuregate-test-analyzer.go (1)

1105-1129: 🎯 Functional Correctness | 🟠 Major

Count empty failed-run cases as failures

Failed job runs with no reported FailedTestNames still fall through to SuccessfulRuns++, so installer or early infra failures get counted as passes. Treat those runs as failures unless they’re explicitly known/triaged.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/codegen/cmd/featuregate-test-analyzer.go` around lines 1105 - 1129, The
job-run counting logic in the loop over jobRuns is treating failed runs with no
FailedTestNames as successes because only untriagedTestFailures increments
FailedRuns. Update the featuregate test analyzer to count any jobRun with
OverallResult == "F" and not KnownFailure as a failed run, even when
FailedTestNames is empty, while still preserving the existing triage filtering
for reporting untriaged failures in the jobRun/triagedTestFailures path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tools/codegen/cmd/featuregate-test-analyzer.go`:
- Around line 1105-1129: The job-run counting logic in the loop over jobRuns is
treating failed runs with no FailedTestNames as successes because only
untriagedTestFailures increments FailedRuns. Update the featuregate test
analyzer to count any jobRun with OverallResult == "F" and not KnownFailure as a
failed run, even when FailedTestNames is empty, while still preserving the
existing triage filtering for reporting untriaged failures in the
jobRun/triagedTestFailures path.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 39488995-d772-40a5-9c03-1230580f496e

📥 Commits

Reviewing files that changed from the base of the PR and between c07aa2f and dbd3ddd.

📒 Files selected for processing (2)
  • tools/codegen/cmd/featuregate-test-analyzer.go
  • tools/codegen/pkg/sippy/json_types.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants