██████╗ ███████╗████████╗███████╗██████╗ ██████╗██████╗
██╔══██╗██╔════╝╚══██╔══╝██╔════╝██╔══██╗██╔════╝╚════██╗
██████╔╝█████╗ ██║ █████╗ ██████╔╝██║ █████╔╝
██╔═══╝ ██╔══╝ ██║ ██╔══╝ ██╔══██╗██║ ╚═══██╗
██║ ███████╗ ██║ ███████╗██║ ██║╚██████╗██████╔╝
╚═╝ ╚══════╝ ╚═╝ ╚══════╝╚═╝ ╚═╝ ╚═════╝╚═════╝
Self-taught systems programmer working at the GPU / driver / ML-runtime boundary. From-scratch on-device NPU inference across two vendors' silicon — AMD (Radeon 890M iGPU + XDNA 2 NPU) and MediaTek (MDLA / APU 650). I build PyTorch backends, patch kernel drivers, drive vendor NPU compilers directly, and write the upstream bug reports for hardware the software stack hasn't caught up to yet — and I publish all of it.
Async / written-first collaborator. Comfortable in Rust and C++ down to the dispatcher, allocator, and SPIR-V level.
Filed reproducible upstream issues against PyTorch, ROCm, and AMD driver projects documenting where the software stack breaks on this silicon. Several triaged by maintainers; one closed after direct collaboration with an AMD engineer. (These are reported & triaged issues, not merged fixes.)
- PyTorch #178934, #178839 — MIOpen Gemm solvers return
workspace_size=0on gfx1150 (triaged,has-workaround) - ROCm/rocm-libraries #6045, #6048 — gfx1150 missing from CK whitelist; CK VGPR mismatch (in triage)
- ROCm/composable_kernel #3724 — WMMA kernels fail on gfx1150
- amd/xdna-driver #1257 —
aie2_smu_initcold-boot precheck failure (closed after collaboration with AMD) - amd/Triton-XDNA #33 —
detect_npu_version()doesn't recognize RyzenAI-npu4
"Two Negative Results for Vector Symbolic Architectures" — single-author 12-page preprint showing VSAs fail at FFN replacement (a rank bottleneck: VSA retrieval is rank ≤ top-k while FFN effective rank exceeds 2048) and at compositional image generation, with cross-scale validation on Qwen3-4B / 8B / 27B. Preprint, targeting the NeurIPS Negative Results track — not peer-reviewed. Read it: github.com/Peterc3-dev/cube-memory/tree/master/paper
- torch-vulkan — expanding op coverage on the Vulkan/SPIR-V PyTorch backend
- amdxdna NPU — driver debugging and bring-up on XDNA 2 (Strix Point)
- Cross-arch ML enablement on RDNA 3.5 + XDNA 2 — building and reporting upstream as gaps surface
Rust · C++17 · Python · GLSL/SPIR-V · Vulkan (Kompute) · HIP/ROCm · Linux kernel (driver debugging) · CachyOS/Arch · Kotlin / Android SDK · GraphQL · Tailscale

