GitHub - healeycodes/kernel-lowering: 🏎️ A tiny compiler that lowers kernel loops into explicit data-parallel code. · GitHub
Skip to content

healeycodes/kernel-lowering

Folders and files

Repository files navigation

🏎️ kernel-lowering

My blog post: A Tiny Compiler for Data-Parallel Kernels


This is a tiny compiler (<180LOC) experiment for lowering simple kernel loops into explicit data-parallel code. It takes a small hand-written AST and prints a lowered form with lanes, masks, masked loads, and gathers.

More info in the blog post above!


python compiler.py

EXAMPLE: Scale audio volume
SOURCE
kernel scale_audio(samples, out, n, volume):
    for i in range(n):
        # Each sample can be adjusted without looking at its neighbors.
        # That independence lets several samples run side by side.
        out[i] = samples[i] * volume

LOWERED
kernel scale_audio(samples, out, n, volume):
  vector_for base in range(0, n, LANES):
    let i = (base + lane_id)
    let active = (i < n)
    masked_store(out, i, (masked_load(samples, i, active) * volume), active)

EXAMPLE: Move particles
SOURCE
kernel move_particles(position, velocity, out, n, dt):
    for i in range(n):
        # Particle i moves using its own position and velocity.
        # It does not depend on particle i - 1 or particle i + 1.
        out[i] = position[i] + velocity[i] * dt

LOWERED
kernel move_particles(position, velocity, out, n, dt):
  vector_for base in range(0, n, LANES):
    let i = (base + lane_id)
    let active = (i < n)
    masked_store(out, i, (masked_load(position, i, active) + (masked_load(velocity, i, active) * dt)), active)

EXAMPLE: Color by number
SOURCE
kernel color_by_number(color_number, colors, out, n):
    for i in range(n):
        # Each pixel stores a small color number, like a color-by-number page.
        number = color_number[i]

        # Neighboring pixels can name completely different colors.
        # The lowered code must let each lane read its own color entry.
        out[i] = colors[number]

LOWERED
kernel color_by_number(color_number, colors, out, n):
  vector_for base in range(0, n, LANES):
    let i = (base + lane_id)
    let active = (i < n)
    let number = masked_load(color_number, i, active)
    masked_store(out, i, gather(colors, number, active), active)

About

🏎️ A tiny compiler that lowers kernel loops into explicit data-parallel code.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

Languages