Comparing main...cd/add-gpu-test-ci · NVIDIA/cutile-python · GitHub
Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: NVIDIA/cutile-python
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: main
Choose a base ref
...
head repository: NVIDIA/cutile-python
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: cd/add-gpu-test-ci
Choose a head ref
Checking mergeability… Don’t worry, you can still create the pull request.
  • 11 commits
  • 1 file changed
  • 2 contributors

Commits on Mar 18, 2026

  1. ci: add GPU test job using self-hosted runners

    Add a test matrix job that runs on self-hosted GPU runners (AWS EC2
    Ampere instances). Tests run inside Docker containers with --gpus all
    using the pre-built test images from GHCR. Also update all image tags
    to 2026-03-18 builds which include tileiras 13.2 (adds sm_86 support).
    camille-004 committed Mar 18, 2026
    Configuration menu
    Copy the full SHA
    e782feb View commit details
    Browse the repository at this point in the history
  2. ci: fix workspace permissions after docker test runs

    Docker containers run as root, so files created during tests (e.g.
    .pytest_cache) are root-owned. Subsequent jobs on the same runner
    fail when actions/checkout tries to clean the workspace. Fix by
    restoring ownership after each test run.
    camille-004 committed Mar 18, 2026
    Configuration menu
    Copy the full SHA
    2434e12 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    48aef3a View commit details
    Browse the repository at this point in the history

Commits on Mar 25, 2026

  1. ci: use native container directive for GPU test jobs

    Replace manual docker run/mount/permission-fix workflow with GitHub
    Actions' built-in container: directive and --gpus all via options.
    camille-004 committed Mar 25, 2026
    Configuration menu
    Copy the full SHA
    4ebcc25 View commit details
    Browse the repository at this point in the history
  2. ci: bump actions to Node.js 24 versions

    - actions/upload-artifact v4 -> v6
    - actions/download-artifact v4 -> v7
    - dorny/test-reporter v2 -> v3
    camille-004 committed Mar 25, 2026
    Configuration menu
    Copy the full SHA
    0febd8e View commit details
    Browse the repository at this point in the history
  3. ci: harden workflow with least-privilege permissions and best practices

    - Add workflow-level permissions (contents: read, packages: read)
    - Add checks: write to test job for dorny/test-reporter
    - Add fail-fast: false to build and test matrices
    - Replace if: always() with if: !cancelled() on test result steps
    camille-004 committed Mar 25, 2026
    Configuration menu
    Copy the full SHA
    6ea74dd View commit details
    Browse the repository at this point in the history
  4. ci: update test images with git, simplify pytest command

    - Update test image tags to include git (needed by dorny/test-reporter)
    - Remove --ignore internal (no internal folder in OSS)
    - Remove -m 'not benchmark and not use_mlir' (run all tests, use_mlir
      auto-skips)
    camille-004 committed Mar 25, 2026
    Configuration menu
    Copy the full SHA
    cf3c138 View commit details
    Browse the repository at this point in the history
  5. ci: separate benchmark tests into dedicated job

    - Exclude benchmarks from the regular test job with -m "not benchmark"
      to prevent GPU OOM from large tensor allocations competing with
      parallel test jobs
    - Add a dedicated benchmark job (Python 3.10 only, continue-on-error)
      mirroring the GitLab CI pattern
    - Add git safe.directory config to the test job to fix dorny/test-reporter
      failing with git exit code 128 in Docker containers
    
    Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
    camille-004 and claude committed Mar 25, 2026
    Configuration menu
    Copy the full SHA
    f20f535 View commit details
    Browse the repository at this point in the history
  6. ci: run benchmark job after test jobs to avoid GPU contention

    Benchmark was running in parallel with all 4 test jobs, putting 5
    concurrent GPU workloads on the same runner and causing OOM for
    everything. Sequencing benchmark after test ensures it gets the
    GPU to itself.
    
    Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
    camille-004 and claude committed Mar 25, 2026
    Configuration menu
    Copy the full SHA
    88f0073 View commit details
    Browse the repository at this point in the history
  7. ci: set fail-on-error=false on test reporters

    The dorny/test-reporter steps were failing the job a second time when
    tests failed. The pytest step is the authoritative failure signal;
    the reporter is for display only.
    
    Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
    camille-004 and claude committed Mar 25, 2026
    Configuration menu
    Copy the full SHA
    88951f1 View commit details
    Browse the repository at this point in the history

Commits on Mar 30, 2026

  1. Configuration menu
    Copy the full SHA
    c3ee664 View commit details
    Browse the repository at this point in the history
Loading