GitHub - jamtho/xlduck: Excel-DuckDB addin for fast, interactive 10bn+ row analysis · GitHub
Skip to content

jamtho/xlduck

Folders and files

Repository files navigation

XLDuck logo

XLDuck

Excel add-in wrapping DuckDB for in-cell SQL queries.

For a visual tour of the handle model, RTD lifecycle, preview pane, and configuration flow, see the architecture infographic. For the full written design notes, see ARCHITECTURE.md.

Prerequisites

Build

cd XlDuck
dotnet build

Run

Open the add-in directly to launch Excel with it loaded:

XlDuck\bin\Debug\net8.0-windows\XlDuck-AddIn64.xll

Excel Functions

Function Description
=DuckQuery(sql, ...) Execute SQL, return a table handle (duck://table/1|10x4 = 10 rows, 4 cols)
=DuckQueryAfterConfig(sql, ...) Same as DuckQuery, but waits for DuckConfig first
=DuckFrag(sql, ...) Create SQL fragment for lazy evaluation (duck://frag/...)
=DuckFragAfterConfig(sql, ...) Same as DuckFrag, but waits for DuckConfig first
=DuckCapture(range) Capture a sheet range as a DuckDB table (first row = headers)
=DuckDate(cell) Convert Excel date to SQL date string (2023-01-01)
=DuckDateTime(cell) Convert Excel date/time to SQL datetime string (2023-01-01 14:30:00)
=DuckOut(handle) Output a handle as a spilled array
=DuckQueryOut(sql, ...) Execute SQL and output directly as array
=DuckQueryOutScalar(sql, ...) Execute SQL and return a single value (first column, first row)
=DuckPlot(data, template, ...) Create a chart from data (duck://plot/...)
=DuckConfig(range) Configure DuckDB from a range of SQL statements (recommended)
=DuckExecute(sql) Execute DDL/DML statements (intended for VBA startup)
=DuckConfigReady() Signal that configuration is complete (legacy, for VBA)
=DuckVersion() XLDuck add-in version (0.1)
=DuckLibraryVersion() DuckDB library version

The architecture infographic shows how these functions compose into spreadsheet-native dataflows.

Examples

Basic Usage

A1: =DuckQueryOut("SELECT * FROM range(5)")
→ | range |
  | 0     |
  | 1     |
  | 2     |
  | 3     |
  | 4     |

Using Handles for Chaining

A1: =DuckQuery("SELECT * FROM range(10)")
→ duck://table/1|10x1

B1: =DuckQuery("SELECT * FROM ? WHERE range > 5", A1)
→ duck://table/2|4x1

C1: =DuckQuery("SELECT SUM(range) AS total FROM ?", B1)
→ duck://table/3|1x1

D1: =DuckOut(C1)
→ | total |
  | 30    |

Parameter Binding

Use ? placeholders for positional arguments (up to 8):

=DuckQuery("SELECT * FROM ? JOIN ? ON t1.id = t2.id", A1, B1)

Lazy Evaluation with Fragments

Fragments (duck://frag/...) defer SQL execution - the SQL is inlined as a subquery when used:

A1: =DuckFrag("SELECT * FROM range(10)")
→ duck://frag/1

B1: =DuckFrag("SELECT * FROM ? WHERE range >= 5", A1)
→ duck://frag/2

C1: =DuckOut(B1)
→ | range |
  | 5     |
  | 6     |
  | 7     |
  | 8     |
  | 9     |

When DuckOut(B1) executes, it builds and runs:

SELECT * FROM (SELECT * FROM (SELECT * FROM range(10)) WHERE range >= 5)

Fragments are validated at creation time (EXPLAIN), so SQL errors appear early.

Use fragments for:

  • Building query pipelines without materializing intermediate results
  • Allowing DuckDB to optimize the entire composed query
  • Reducing memory usage for complex transformations

Reading Files

DuckDB can read CSV, Parquet, JSON, and other file formats directly:

=DuckQueryOut("SELECT * FROM read_csv_auto('C:/data/sales.csv')")

=DuckQueryOut("SELECT * FROM read_parquet('C:/data/events.parquet') WHERE date > '2024-01-01'")

=DuckQueryOut("SELECT * FROM read_json_auto('C:/data/config.json')")

Combine with fragments for reusable data sources:

A1: =DuckFrag("SELECT * FROM read_csv_auto('C:/data/sales.csv')")
B1: =DuckQueryOut("SELECT region, SUM(amount) FROM ? GROUP BY region", A1)

DuckDB can also read from URLs and S3 - see DuckDB documentation for details.

Capturing Sheet Data

DuckCapture brings Excel range data into DuckDB for querying. The first row is treated as headers, the rest as data:

A1:C4 contains:
  | name    | age | city    |
  | alice   | 30  | NYC     |
  | bob     | 25  | LA      |
  | charlie | 35  | Chicago |

D1: =DuckCapture(A1:C4)
→ duck://table/1|3x3

E1: =DuckQueryOut("SELECT * FROM ? WHERE age > 28", D1)
→ | name    | age | city    |
  | alice   | 30  | NYC     |
  | charlie | 35  | Chicago |

Column types are inferred automatically: all-numeric columns become DOUBLE, all-boolean become BOOLEAN, everything else becomes VARCHAR. Empty cells are treated as NULL.

Combine with other functions for analysis:

A1: =DuckCapture(Sheet2!A1:D100)
B1: =DuckQueryOut("SELECT department, AVG(salary) FROM ? GROUP BY department", A1)

Date Parameters

Excel stores dates as serial numbers, so passing a date cell directly as a ? parameter won't work — DuckDB receives a number like 44927 instead of a date. Use DuckDate or DuckDateTime to convert:

C1: 1/1/2023                    (Excel date)
D1: 12/31/2023                  (Excel date)

E1: =DuckQuery("SELECT * FROM ? WHERE date BETWEEN ? AND ?", A1, DuckDate(C1), DuckDate(D1))

F1: =DuckQuery("SELECT * FROM ? WHERE timestamp > ?", A1, DuckDateTime(C1))

Pivot Tables

DuckDB has built-in PIVOT support for reshaping data:

A1: =DuckFrag("SELECT * FROM (VALUES ('Q1','North',100), ('Q1','South',150), ('Q2','North',200), ('Q2','South',250)) AS sales(quarter, region, amount)")

B1: =DuckQueryOut("PIVOT ? ON region USING SUM(amount)", A1)
→ | quarter | North | South |
  | Q1      | 100   | 150   |
  | Q2      | 200   | 250   |

See DuckDB PIVOT documentation for more examples.

Plotting

Create interactive charts with DuckPlot. Select a plot handle cell and open the Preview Pane to view.

A1: =DuckQuery("SELECT region, SUM(sales) as total FROM (VALUES ('North', 100), ('South', 150), ('East', 80), ('West', 120)) AS t(region, sales) GROUP BY region")

B1: =DuckPlot(A1, "bar", "x", "region", "y", "total", "title", "Sales by Region")
→ duck://plot/1

Templates:

Template Use Case
bar Aggregated values per category
line Time series, trends
point Scatter plots, correlations
area Cumulative/stacked time series
histogram Distribution of values (only needs x)
heatmap Two categories with color intensity (needs x, y, value)
boxplot Distribution comparison across categories

Overrides:

  • x - field for x-axis (required)
  • y - field for y-axis (required, except histogram)
  • color - field for color/series (optional)
  • value - field for color intensity (heatmap only)
  • title - chart title (optional)

Examples:

Bar chart:

A1: =DuckQuery("SELECT region, SUM(sales) as total FROM (VALUES ('North', 100), ('South', 150), ('East', 80), ('West', 120)) AS t(region, sales) GROUP BY region")
B1: =DuckPlot(A1, "bar", "x", "region", "y", "total", "title", "Sales by Region")

Line chart with multiple series:

A1: =DuckQuery("SELECT x as day, 'A' as product, x*10 as sales FROM range(20) UNION ALL SELECT x, 'B', x*7+20 FROM range(20)")
B1: =DuckPlot(A1, "line", "x", "day", "y", "sales", "color", "product")

Scatter plot:

A1: =DuckQuery("SELECT random()*100 as x, random()*100 as y FROM range(200)")
B1: =DuckPlot(A1, "point", "x", "x", "y", "y")

Area chart with stacked series:

A1: =DuckQuery("SELECT x as month, 'Product A' as product, x*5+10 as revenue FROM range(12) UNION ALL SELECT x, 'Product B', x*3+20 FROM range(12)")
B1: =DuckPlot(A1, "area", "x", "month", "y", "revenue", "color", "product")

Histogram (distribution):

A1: =DuckQuery("SELECT random()*100 as value FROM range(1000)")
B1: =DuckPlot(A1, "histogram", "x", "value", "title", "Value Distribution")

Boxplot (compare distributions):

A1: =DuckQuery("SELECT category, value FROM (SELECT 'A' as category, random()*50 as value FROM range(100) UNION ALL SELECT 'B', random()*50+25 FROM range(100))")
B1: =DuckPlot(A1, "boxplot", "x", "category", "y", "value")

Heatmap:

A1: =DuckQuery("SELECT day, hour, temp FROM (SELECT d.d as day, h.h as hour, (15 + d.d + h.h*0.5 + random()*5)::INT as temp FROM range(7) AS d(d), range(24) AS h(h))")
B1: =DuckPlot(A1, "heatmap", "x", "hour", "y", "day", "value", "temp")

Configuration

DuckDB has two flavours of session-scope setup, and they belong in different places:

  1. Non-secret stateINSTALL, LOAD, SET s3_url_style = 'path', custom UDFs, etc. These are safe to ship inside the workbook. Use DuckConfig.
  2. Credentials — S3 access keys, HuggingFace tokens, and anything else you would not paste into Slack. Do not put these in the workbook. Use persistent secrets instead — set up once in a regular DuckDB CLI, auto-loaded by every DuckDB process on your account (including XLDuck inside Excel). The workbook stays credential-free and safe to email, commit, or autosave to OneDrive.

Persistent secrets (for credentials)

DuckDB persistent secrets live in a per-user store at %USERPROFILE%\.duckdb\stored_secrets\ and are loaded automatically when any DuckDB process starts. The secret manager applies them per request via longest-prefix SCOPE match — so multiple credentialed S3 buckets can coexist with no conflict.

One-time setup (run in a normal duckdb CLI, not in Excel)

INSTALL httpfs;
LOAD httpfs;

CREATE OR REPLACE PERSISTENT SECRET my_bucket (
    TYPE       S3,
    KEY_ID     '...',
    SECRET     '...',
    ENDPOINT   '127.0.0.1:9000',     -- or e.g. s3.amazonaws.com
    USE_SSL    false,
    URL_STYLE  'path',
    REGION     'us-east-1',
    SCOPE      's3://my-bucket'
);

This writes a binary file to %USERPROFILE%\.duckdb\stored_secrets\my_bucket.duckdb_secret. The security boundary is the user-only NTFS ACL on your home directory — do not place that path on a roaming profile, OneDrive-synced folder, or shared volume.

In the workbook

No DuckConfig, no AfterConfig, no credentials in cells — queries just work:

A1: =DuckQueryOut("SELECT count(*) FROM read_parquet('s3://my-bucket/data/*.parquet')")

You don't need INSTALL httpfs or LOAD httpfs in the sheet either: DuckDB auto-loads the httpfs extension the first time a query references an s3:// URI (default autoload_known_extensions = true). For common credentialed S3 reads, the workbook needs no DuckConfig at all. A colleague opening a copy of the sheet on a machine without your secret will get IO Error: No credentials found — which is the correct security boundary.

Sanity-check what DuckDB has loaded:

=DuckQueryOut("SELECT name, type, scope FROM duckdb_secrets()")

Rotation and removal

-- rotate, e.g. after issuing a fresh service-account credential
CREATE OR REPLACE PERSISTENT SECRET my_bucket ( ... );

-- remove
DROP PERSISTENT SECRET my_bucket;

Caveat for embedded DuckDB

If queries from XLDuck come back with IO Error: No credentials found despite the persistent secret existing, the embedded DuckDB inside XLDuck may be using a non-default home directory for its secret store. In that case you can either point it at the right path via SET secret_directory = '...' early in DuckConfig, or use PROVIDER credential_chain and set AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY as User-level Windows environment variables.

DuckConfig for non-secret state

Most workbooks don't actually need DuckConfig — common extensions like httpfs, json, and spatial auto-load on first reference. Reach for DuckConfig when you need to set things that don't auto-trigger: non-default extensions, memory limits, custom UDFs, or non-default settings that should apply to every query in the workbook.

DuckConfig lets you configure DuckDB from a plain .xlsx file — no VBA macros or .xlsm format needed.

Write your configuration SQL in cells and pass the range to DuckConfig:

' On a DuckConfig sheet:
A1: INSTALL httpfs
A2: LOAD httpfs
A3: SET s3_url_style = 'path'
B1: =DuckConfig(A1:A3)
→ duck://config|OK

Queries that need config should use the AfterConfig variants, which wait until DuckConfig completes:

C1: =DuckQueryAfterConfig("SELECT * FROM read_parquet('s3://bucket/data.parquet')")

Alternatively, pass the DuckConfig cell as an argument to create an explicit dependency:

C1: =DuckQuery("SELECT * FROM read_parquet('s3://bucket/data.parquet')", B1)

The config handle is silently ignored as a parameter — it only ensures Excel calculates DuckConfig first.

If no configuration is needed but you want to use AfterConfig functions, call =DuckConfig() with no arguments.

Developing config

DuckConfig is idempotent: recalculating with the same inputs is a no-op. If the inputs change (e.g., you edit a cell in the config range), it returns an error because queries that already ran may depend on the original configuration.

While developing a sheet, use the Reset Config button in the XLDuck ribbon tab to clear the stored config state, then recalculate to re-run DuckConfig with the updated inputs.

Process isolation

The DuckDB engine and configuration are shared across all workbooks in an Excel process. If you need different configurations for different workbooks (e.g., different S3 endpoints), open each in a separate Excel process:

  • Windows: Win+Rexcel /x opens a new Excel process
  • Programmatic: Start-Process excel -ArgumentList "/x"

Pause Queries

The XLDuck ribbon tab includes a Pause Queries toggle button. When paused, all query execution is deferred — cells show #duck://blocked/paused|Queries paused instead of running. Toggle off to resume: all deferred queries execute automatically and results flow to cells.

Use this when building complex formulas to avoid triggering expensive queries during editing. For example, if you have a chain of queries (A1 feeds B1 feeds C1) and need to restructure them, pausing prevents each intermediate edit from triggering a cascade of expensive executions.

1. Click "Pause Queries" in the XLDuck ribbon tab
2. Edit formulas freely — cells show "Queries paused" instead of executing
3. Click "Pause Queries" again to resume — all queries execute and results appear

Also available programmatically via VBA:

Application.Run "DuckPauseQueries"   ' pause
Application.Run "DuckResumeQueries"  ' resume

Pause vs Cancel — two different behaviors:

  • Pause: "I'm editing, don't run anything yet" — queries are deferred and resume automatically when toggled off
  • Cancel: "Kill this query, I don't want it" — queries error out permanently, new queries after cancellation run normally

Pausing while a query is running will cancel it and defer it for re-execution on resume. Queries that depend on other paused queries resolve naturally through Excel's recalculation chain — root queries execute first, then dependents recalculate with the correct handles.

Query Engine Busy

Synchronous functions (DuckOut, DuckQueryOut, DuckQueryOutScalar) may show a "Query engine busy" error if a background query is running. This prevents Excel from freezing during long-running queries. Press F9 to recalculate once the background query completes.

DuckExecute blocks until the connection is available rather than returning an error, since it runs DDL/DML that must succeed (e.g. SET s3_endpoint, CREATE TABLE). It should only be called from VBA (e.g. Auto_Open), not from worksheet formulas. For worksheet-based configuration, use DuckConfig instead.

Cancel Query

The XLDuck ribbon tab includes a Cancel Query button that interrupts the running query and cancels all pending queued queries. The connection remains valid after cancellation — subsequent queries work normally.

Also available programmatically via VBA:

Application.Run "DuckInterrupt"

Preview Pane

The XLDuck ribbon tab includes a toggle to open a preview pane on the right side of the window. When you select a cell containing a handle:

  • Table handles: Shows SQL with positional arguments, column schema, and the first 200 rows of data
  • Fragment handles: Shows the SQL text and positional arguments
  • Plot handles: Shows an interactive Vega-Lite chart
  • Error handles: Shows the error category and message

Requires WebView2 Runtime (falls back to plain text if not installed).

Logging

Log files are written to %LOCALAPPDATA%\XlDuck\ (typically C:\Users\<you>\AppData\Local\XlDuck\). Each Excel session creates a new log file named xlduck-{timestamp}.log. Files older than 7 days are automatically deleted on startup.

Logs include query timing, RTD lifecycle events, handle reference counting, and error details. To tail the current session's log:

Get-ChildItem "$env:LOCALAPPDATA\XlDuck\xlduck-*.log" | Sort-Object LastWriteTime -Descending | Select-Object -First 1 | Get-Content -Wait

Architecture Infographic

XLDuck technical architecture overview

This visual overview summarizes the core spreadsheet workflow, handle model, RTD lifecycle, preview pane, and configuration trade-offs.

Credits

Several design ideas take inspiration from the superb PyXLL add-in, which you should check out immediately if you've ever considered integrating Python code with your sheets.

About

Excel-DuckDB addin for fast, interactive 10bn+ row analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors