fix(ui): wait for process events before timeline tests by davdhacs · Pull Request #21422 · stackrox/stackrox · GitHub
Skip to content

fix(ui): wait for process events before timeline tests#21422

Draft
davdhacs wants to merge 7 commits into
davdhacs/ui-e2e-gke-onlyfrom
davdhacs/fix-timeline-real-data
Draft

fix(ui): wait for process events before timeline tests#21422
davdhacs wants to merge 7 commits into
davdhacs/ui-e2e-gke-onlyfrom
davdhacs/fix-timeline-real-data

Conversation

@davdhacs

Copy link
Copy Markdown
Contributor

Description

On a freshly deployed GKE cluster, process events may not be available immediately after deployment. The deployment/pod timeline Cypress tests that use real API data (no fixtures) fail because the timeline renders empty when no process events have been collected yet by collector.

Two-layer fix:

  • Workflow level: poll /v1/processcount in the *risk* shard wait step until process events are detected (up to 5 minutes, 60 iterations at 5s each)
  • Cypress level: add before() hooks in deploymentTimeline.test.js and podTimeline.test.js that poll /v1/processcount via cy.request (up to 2 minutes), ensuring data readiness even if the workflow-level wait was insufficient

The waitForProcessEvents() helper is added to Risk.helpers.js and uses recursive cy.request polling with Cypress.env('ROX_AUTH_TOKEN') for authentication.

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • modified existing tests

How I validated my change

Dispatched UI E2E workflow run to verify timeline tests pass with real data on a fresh GKE cluster.

On a freshly deployed GKE cluster, process events may not be available
immediately. The deployment/pod timeline tests that use real API data
(no fixtures) fail because the timeline renders empty when no process
events exist yet.

Two-layer fix:
1. Workflow: poll /v1/processcount in the risk shard wait step until
   process events are detected (up to 5 minutes).
2. Cypress: add before() hooks in deploymentTimeline and podTimeline
   test files that poll /v1/processcount, ensuring data readiness
   even if the workflow-level wait was insufficient.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 50.07%. Comparing base (d198431) to head (bf53cfa).
⚠️ Report is 2 commits behind head on davdhacs/ui-e2e-gke-only.

Additional details and impacted files
@@                    Coverage Diff                    @@
##           davdhacs/ui-e2e-gke-only   #21422   +/-   ##
=========================================================
  Coverage                     50.07%   50.07%           
=========================================================
  Files                          2835     2835           
  Lines                        217546   217546           
=========================================================
+ Hits                         108932   108939    +7     
+ Misses                       100708   100702    -6     
+ Partials                       7906     7905    -1     
Flag Coverage Δ
go-unit-tests 50.07% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

🚀 Build Images Ready

Images are ready for commit bf53cfa. To use with deploy scripts:

export MAIN_IMAGE_TAG=4.12.x-320-gbf53cfa89d

The Cypress before() hook already waits for process events —
the workflow-level wait is redundant. Keeps this PR as a pure
test fix with no workflow file overlap with PR #21421.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
davdhacs added a commit that referenced this pull request Jun 25, 2026
Remove test changes from the workflow PR — test fixes belong in
separate PRs. The network flow data wait in PR #21422 should
make this test work without modifications.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
davdhacs and others added 3 commits June 25, 2026 12:47
All vulnmanagement tests use real API data (not fixtures) and need image
CVE data from a completed scan to produce meaningful results. Add a
waitForImageCVEs() helper that triggers an nginx:1.12 scan and polls
GraphQL { imageCVECount } until > 0, matching the pattern established by
waitForProcessEvents() for timeline tests.

Added before() hooks to the 9 vulnmanagement test files that query image
CVE data: dashboard, dashboardToEntityPage, clusters, deployments,
entitypages, imageComponents, imageCves, images, namespaces.

Not added to: clusterCves (platform CVEs only), nodeCves/nodeComponents/
nodes (node-level data only).

Analysis of other shards:
- vulnerabilities: all fixture-based (visitWorkloadCveOverview stubs
  getImageCVEList), no wait needed
- networkGraph: tests check stackrox namespace deployments which are
  always present, no wait needed

AI-assisted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test changes belong in separate PRs. Network graph tests don't
need data waits — they check for platform deployments which are
always present.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Network graph sidebar test failed on fresh GKE clusters because collector
hadn't sent network flow data yet. Added waitForNetworkFlows() that polls
the network graph API until collector has outgoing edges.

Timeline tests failed because waitForProcessEvents() checked global
process count (/v1/processcount) which passes immediately, but the
specific deployment shown in the risk list had no timeline events yet.
Replaced with deployment-specific polling via /v1/deploymentswithprocessinfo
that verifies the first risk-sorted stackrox deployment has baseline
statuses. Also increased timeout from 2m to 5m and interval from 5s to
10s to reduce API load during warm-up.

AI-assisted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

@davdhacs davdhacs added the e2e-ui-gke Run UI E2E tests on shared GKE cluster label Jun 25, 2026
davdhacs and others added 2 commits June 25, 2026 16:40
- Process events: use /v1/processcount (simpler, proven API) with
  10-minute timeout and count > 10 threshold. The previous
  /v1/deploymentswithprocessinfo endpoint wasn't returning expected data.
- Network flows: increase timeout to 10 minutes.
- Both: increase polling interval to 15s to reduce log noise.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The graph topology takes longer to render on GKE after selecting
a deployment. Increase from default 8s to 30s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant