Add performance regression tests for cuda.core by devin-ai-integration[bot] · Pull Request #2 · Custom-Devin-Demos/cuda-python · GitHub
Skip to content

Add performance regression tests for cuda.core#2

Open
devin-ai-integration[bot] wants to merge 1 commit into
mainfrom
devin/1764978902-performance-regression-tests
Open

Add performance regression tests for cuda.core#2
devin-ai-integration[bot] wants to merge 1 commit into
mainfrom
devin/1764978902-performance-regression-tests

Conversation

@devin-ai-integration

@devin-ai-integration devin-ai-integration Bot commented Dec 5, 2025

Copy link
Copy Markdown

Description

Adds automated performance regression tests for the cuda.core API to detect when changes inadvertently slow down critical CUDA operations.

Changes:

  • New test_performance_regression.py with two tests:
    • Kernel launch overhead benchmark (1000 empty kernel launches)
    • Memory transfer bandwidth benchmark (1KB, 1MB, 64MB transfers)
  • Register performance pytest marker in conftest.py
  • CI workflow updated to run performance tests on A100 hardware only
  • PERFORMANCE_THRESHOLDS.md documenting threshold values and update procedures

Thresholds:

  • Kernel launch: < 50μs average
  • Memory bandwidth: 0.1/5/10 GB/s for 1KB/1MB/64MB respectively

Human Review Checklist

  • Verify device.allocate() API: Confirmed - Device.allocate(size, stream) exists at line 1330 in _device.pyx and delegates to self.memory_resource.allocate(size, stream).
  • Threshold values: The thresholds are estimates based on A100 specs. May need adjustment after running on actual CI hardware.
  • Kernel launch test lacks warm-up: Consider if a warm-up iteration is needed before timing (though 1000 iterations should average out cold-start effects). Note: memory transfer test does include warm-up.
  • CI validation: Tests have not been validated in CI yet (forked repo may not have workflows enabled). Verify tests pass on actual A100 hardware.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Link to Devin run: https://app.devin.ai/sessions/2db856c666e84ea98b2800312a6f0ce3
Requested by: Shawn Azman (shawn@cognition.ai) / @ShawnAzman

- Create test_performance_regression.py with kernel launch and memory
  transfer benchmarks
- Add performance marker registration in conftest.py
- Update CI workflow to run performance tests on A100 hardware
- Add PERFORMANCE_THRESHOLDS.md documentation

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@devin-ai-integration

Copy link
Copy Markdown
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant