iframe-proxy

ryzhyk · 2026-04-20T21:12:14Z

We resisted adding these operators for a long time. The reason is that it's very easy to use them in a way that makes incremental updates inefficient: inserting a new element with rank 1 requires updating the ranks of all existing records in the collection.

Instead, we supported a limited subset of rank queries that can compiled into top-k.

Having said that, rank queries are not always bad, e.g., they can be efficient when groups are small or when most new values are added with the highest rank. Performance aside, it is sometime impossible to avoid rank queries.

This commit introduces general-purpose rank and dense_rank operators that match the semantics of RANK and DENSE_RANK in SQL.

The algorithm in a nutshell finds the smallest rank affected by new inputs and update the ranks of all records from that point forward.

This commit doesn't yet implement the ROW_NUMBER operator, which I will work on next.

Describe Manual Test Plan

automatic tests only

Checklist

Unit tests added/updated
Integration tests added/updated
Documentation updated
Changelog updated

Breaking Changes?

Mark if you think the answer is yes for any of these components:

OpenAPI / REST HTTP API / feldera-types / manager (What is a breaking change?)
Feldera SQL (Syntax, Semantics)
feldera-sqllib (incl. dependencies fxp, etc.) (What is a breaking change?)
Python SDK (What is a breaking change?)
fda (CLI arguments)
Adapters (including configuration)
Storage Format / Checkpoints
Others (specify)

Describe Incompatible Changes

mythical-fred

LGTM

mihaibudiu · 2026-04-20T22:15:42Z

+    ///
+    /// The `CF` type, `projection_func`, and `rank_cmp_func` function together establish the
+    /// ranking of values in the group:
+    /// * `CF` establishes a _total_ ordering of elements such that `v1 < v2 =>


Shouldn't both of these be <=?

mihaibudiu · 2026-04-20T22:20:35Z

+        CF: CmpFunc<V>,
+        OV: DBData,
+        RV: DBData,
+        PF: Fn(&V, &mut RV) + 'static,


I don't really like mutable arguments in the IR; all my dataflow analyses work on pure functions. I can probably pretend in the IR that this is just a regular function and rewrite it before generating Rust code.

changed to return RV instead.

mihaibudiu · 2026-04-20T22:24:25Z

+            .typed()
+    }
+
+    // /// Number the rows in the group according to the order defined by `CF`.


this is for the next PR? Maybe you want to move it to another commit.

mihaibudiu · 2026-04-20T22:40:58Z

+        })
+    }
+
+    // /// See [`Stream::dense_rank_custom_order`].


I expected dense_rank to be in this PR, only ROW_NUMBER to be missing

it is, I removed the commented function.

ryzhyk · 2026-04-21T05:49:30Z

Thanks. I addressed all comments.

mihaibudiu · 2026-04-22T07:08:01Z

I have pushed two commits, the second one which implements compiler support for RANK and DENSE_RANK.
But I want to write some incremental tests before merging; all the tests I have so far are non-incremental.

mythical-fred

Two findings — one bug, one nit.

We resisted adding these operators for a long time. The reason is that it's very easy to use them in a way that makes incremental updates inefficient: inserting a new element with rank 1 requires updating the ranks of all existing records in the collection. Instead, we supported a limited subset of rank queries that can compiled into top-k. Having said that, rank queries are not always bad, e.g., they can be efficient when groups are small or when most new values are added with the highest rank. Performance aside, it is sometime impossible to avoid rank queries. This commit introduces general-purpose rank and dense_rank operators that match the semantics of RANK and DENSE_RANK in SQL. The algorithm in a nutshell finds the smallest rank affected by new inputs and update the ranks of all records from that point forward. This commit doesn't yet implement the ROW_NUMBER operator, which I will work on next. Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>

Signed-off-by: Mihai Budiu <mbudiu@feldera.com>

mythical-fred

Commit message subject has a trailing period; please fix commit history (rebase).

mythical-fred · 2026-04-23T01:49:09Z

@@ -224,10 +222,10 @@ The following window aggregate functions are supported:
  </tr>
 </table>


Nit: "it are" -> "it is" (subject/verb agreement).

ryzhyk requested a review from mihaibudiu April 20, 2026 21:12

ryzhyk added the DBSP core Related to the core DBSP library label Apr 20, 2026

mythical-fred approved these changes Apr 20, 2026

View reviewed changes

mihaibudiu reviewed Apr 20, 2026

View reviewed changes

ryzhyk force-pushed the rank branch from 93bcc13 to 9356b36 Compare April 21, 2026 05:44

ryzhyk requested a review from mihaibudiu April 21, 2026 05:49

mythical-fred suggested changes Apr 22, 2026

View reviewed changes

Comment thread ...rc/main/java/org/dbsp/sqlCompiler/ir/expression/DBSPAsymmetricFieldComparatorExpression.java

Comment thread ...rc/main/java/org/dbsp/sqlCompiler/ir/expression/DBSPAsymmetricFieldComparatorExpression.java

mihaibudiu force-pushed the rank branch from 341fb2b to c3f91e3 Compare April 22, 2026 23:49

mihaibudiu approved these changes Apr 22, 2026

View reviewed changes

ryzhyk and others added 2 commits April 22, 2026 16:50

[DBSP] Change rank signature

28419fa

Signed-off-by: Mihai Budiu <mbudiu@feldera.com>

mihaibudiu force-pushed the rank branch 2 times, most recently from d8505ae to cddf591 Compare April 23, 2026 00:10

mihaibudiu enabled auto-merge April 23, 2026 00:12

mihaibudiu added this pull request to the merge queue Apr 23, 2026

mihaibudiu mentioned this pull request Apr 23, 2026

[SQL] Support RANK, ROW_NUMBER, DENSE_RANK even if not in a TopK pattern #3934

Open

3 tasks

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 23, 2026

mihaibudiu added 3 commits April 22, 2026 18:36

[SQL] Remove HSQLDB dependence

c7ba05d

Signed-off-by: Mihai Budiu <mbudiu@feldera.com>

[SQL] Support for RANK and DENSE_RANK

655b7b7

Signed-off-by: Mihai Budiu <mbudiu@feldera.com>

[SQL] Document when RANK and DENSE_RANK window functions are expensive

1842b31

Signed-off-by: Mihai Budiu <mbudiu@feldera.com>

mihaibudiu force-pushed the rank branch from cddf591 to 1842b31 Compare April 23, 2026 01:36

mihaibudiu enabled auto-merge April 23, 2026 01:36

mihaibudiu added this pull request to the merge queue Apr 23, 2026

mythical-fred suggested changes Apr 23, 2026

View reviewed changes

Merged via the queue into main with commit f1ab97d Apr 23, 2026
1 check passed

mihaibudiu deleted the rank branch April 23, 2026 02:55

		@@ -224,10 +222,10 @@ The following window aggregate functions are supported:
		</tr>
		</table>

Sunbelt Computer Software

PL/B Language Development and Support

Conversation

ryzhyk commented Apr 20, 2026

Describe Manual Test Plan

Checklist

Breaking Changes?

Describe Incompatible Changes

Uh oh!

mythical-fred left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ryzhyk commented Apr 21, 2026

Uh oh!

mihaibudiu commented Apr 22, 2026

Uh oh!

mythical-fred left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mythical-fred left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants