CI: tidy nightly test-matrix + bump torch to 2.12.1 by leofang · Pull Request #2272 · NVIDIA/cuda-python · GitHub
Skip to content

CI: tidy nightly test-matrix + bump torch to 2.12.1#2272

Open
leofang wants to merge 5 commits into
NVIDIA:mainfrom
leofang:leofang/ci-test-matrix-env-refactor
Open

CI: tidy nightly test-matrix + bump torch to 2.12.1#2272
leofang wants to merge 5 commits into
NVIDIA:mainfrom
leofang:leofang/ci-test-matrix-env-refactor

Conversation

@leofang

@leofang leofang commented Jun 27, 2026

Copy link
Copy Markdown
Member

Summary

  • Collapse per-row MODE / TORCH_VER / TORCH_CUDA of nightly entries into the ENV: map so they ride the existing matrix-env injection step in test-wheel-{linux,windows}.yml. Workflow selectors (ci-nightly.yml) and job-name strings updated accordingly.
  • Bump latest-PyTorch rows from 2.11.02.12.1; 2.9.1 rows unchanged.
  • Job names now also show the torch CUDA suffix, e.g. , 2.12.1+cu126.
  • Align nightly section columns with the pull-request rows for readability.
  • Add a nightly-standard arm64 gh200 row, but comment it out for now: the gh200 runner currently hangs on stream-ordered memory allocator (cudaMallocAsync) calls. The row is left in place (with a TODO) so it can be re-enabled once the runner-side issue is resolved.

Test plan

  • Verify nightly matrix expansion across modes (nightly-pytorch, nightly-numba-cuda, nightly-standard) via a workflow run.
  • PyTorch 2.12.1 wheels (cu126 / cu130) install cleanly.

- ci/test-matrix.yml: move per-row MODE/TORCH_VER/TORCH_CUDA into the
  ENV map (rides the existing matrix-env injection step). Add a
  nightly-standard arm64 gh200 row. Bump latest-PyTorch rows from
  2.11.0 to 2.12.1; 2.9.1 rows untouched.
- .github/workflows/ci-nightly.yml: matrix_filter selectors now key on
  .ENV.MODE.
- .github/workflows/test-wheel-{linux,windows}.yml: job-name format
  strings read TORCH_VER/MODE from matrix.ENV; TORCH_CUDA also rendered
  in the name (e.g. ", 2.12.1+cu126"). Drop the now-redundant
  TORCH_VER/TORCH_CUDA lines from the pytorch step's env block.
@copy-pr-bot

copy-pr-bot Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot added the CI/CD CI/CD infrastructure label Jun 27, 2026
leofang added 2 commits June 27, 2026 04:18
Pad PY_VER and GPU columns in the nightly section to match the widths
used by the pull-request rows above (17-char PY_VER, 19-char GPU).
Purely cosmetic; YAML parse and matrix expansion unchanged.
@leofang

leofang commented Jun 27, 2026

Copy link
Copy Markdown
Member Author

/ok to test d29bc34

Comment thread ci/test-matrix.yml Outdated
@github-actions

Copy link
Copy Markdown

@leofang

leofang commented Jun 27, 2026

Copy link
Copy Markdown
Member Author

leofang added 2 commits June 28, 2026 02:55
The gh200 runner currently hangs on stream-ordered memory allocator
calls (cudaMallocAsync). Disabling until the runner-side issue is
resolved.
@leofang leofang changed the title CI: tidy nightly test-matrix + add arm64 gh200 + bump torch 2.12.1 CI: tidy nightly test-matrix + bump torch to 2.12.1 Jun 28, 2026
@leofang leofang marked this pull request as ready for review June 28, 2026 03:02
@leofang

leofang commented Jun 28, 2026

Copy link
Copy Markdown
Member Author

/ok to test 7e002a6

@leofang leofang self-assigned this Jun 28, 2026
@leofang leofang added this to the cuda.core next milestone Jun 28, 2026
@leofang leofang added enhancement Any code-related improvements P1 Medium priority - Should do labels Jun 28, 2026
@leofang leofang requested a review from mdboom June 30, 2026 04:11
@leofang

leofang commented Jun 30, 2026

Copy link
Copy Markdown
Member Author

@leofang leofang linked an issue Jun 30, 2026 that may be closed by this pull request
@leofang leofang requested a review from rwgk June 30, 2026 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD CI/CD infrastructure enhancement Any code-related improvements P1 Medium priority - Should do

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bump tensor bridge version cap for PyTorch 2.12

1 participant