[dbsp] rank and dense_rank operators. by ryzhyk · Pull Request #6085 · feldera/feldera · GitHub
Skip to content

[dbsp] rank and dense_rank operators.#6085

Merged
mihaibudiu merged 5 commits intomainfrom
rank
Apr 23, 2026
Merged

[dbsp] rank and dense_rank operators.#6085
mihaibudiu merged 5 commits intomainfrom
rank

Conversation

@ryzhyk
Copy link
Copy Markdown
Contributor

@ryzhyk ryzhyk commented Apr 20, 2026

We resisted adding these operators for a long time. The reason is that it's very easy to use them in a way that makes incremental updates inefficient: inserting a new element with rank 1 requires updating the ranks of all existing records in the collection.

Instead, we supported a limited subset of rank queries that can compiled into top-k.

Having said that, rank queries are not always bad, e.g., they can be efficient when groups are small or when most new values are added with the highest rank. Performance aside, it is sometime impossible to avoid rank queries.

This commit introduces general-purpose rank and dense_rank operators that match the semantics of RANK and DENSE_RANK in SQL.

The algorithm in a nutshell finds the smallest rank affected by new inputs and update the ranks of all records from that point forward.

This commit doesn't yet implement the ROW_NUMBER operator, which I will work on next.

Describe Manual Test Plan

automatic tests only

Checklist

  • Unit tests added/updated
  • Integration tests added/updated
  • Documentation updated
  • Changelog updated

Breaking Changes?

Mark if you think the answer is yes for any of these components:

Describe Incompatible Changes

@ryzhyk ryzhyk requested a review from mihaibudiu April 20, 2026 21:12
@ryzhyk ryzhyk added the DBSP core Related to the core DBSP library label Apr 20, 2026
Copy link
Copy Markdown

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread crates/dbsp/src/operator/group/rank.rs Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't both of these be <=?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Comment thread crates/dbsp/src/operator/group/rank.rs Outdated
CF: CmpFunc<V>,
OV: DBData,
RV: DBData,
PF: Fn(&V, &mut RV) + 'static,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like mutable arguments in the IR; all my dataflow analyses work on pure functions. I can probably pretend in the IR that this is just a regular function and rewrite it before generating Rust code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to return RV instead.

.typed()
}

// /// Number the rows in the group according to the order defined by `CF`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for the next PR? Maybe you want to move it to another commit.

Comment thread crates/dbsp/src/operator/dynamic/group/rank.rs Outdated
})
}

// /// See [`Stream::dense_rank_custom_order`].
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected dense_rank to be in this PR, only ROW_NUMBER to be missing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is, I removed the commented function.

Comment thread crates/dbsp/src/operator/dynamic/group/rank.rs Outdated
Comment thread crates/dbsp/src/operator/dynamic/group/rank.rs Outdated
@ryzhyk
Copy link
Copy Markdown
Contributor Author

ryzhyk commented Apr 21, 2026

Thanks. I addressed all comments.

@ryzhyk ryzhyk requested a review from mihaibudiu April 21, 2026 05:49
@mihaibudiu
Copy link
Copy Markdown
Contributor

I have pushed two commits, the second one which implements compiler support for RANK and DENSE_RANK.
But I want to write some incremental tests before merging; all the tests I have so far are non-incremental.

Copy link
Copy Markdown

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two findings — one bug, one nit.

ryzhyk and others added 2 commits April 22, 2026 16:50
We resisted adding these operators for a long time. The reason is that it's
very easy to use them in a way that makes incremental updates inefficient:
inserting a new element with rank 1 requires updating the ranks of all existing
records in the collection.

Instead, we supported a limited subset of rank queries that can compiled into
top-k.

Having said that, rank queries are not always bad, e.g., they can be efficient
when groups are small or when most new values are added with the highest rank.
Performance aside, it is sometime impossible to avoid rank queries.

This commit introduces general-purpose rank and dense_rank operators that match
the semantics of RANK and DENSE_RANK in SQL.

The algorithm in a nutshell finds the smallest rank affected by new inputs and
update the ranks of all records from that point forward.

This commit doesn't yet implement the ROW_NUMBER operator, which I will work on
next.

Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
@mihaibudiu mihaibudiu force-pushed the rank branch 2 times, most recently from d8505ae to cddf591 Compare April 23, 2026 00:10
@mihaibudiu mihaibudiu enabled auto-merge April 23, 2026 00:12
@mihaibudiu mihaibudiu added this pull request to the merge queue Apr 23, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 23, 2026
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Copy link
Copy Markdown

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message subject has a trailing period; please fix commit history (rebase).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "it are" -> "it is" (subject/verb agreement).

Merged via the queue into main with commit f1ab97d Apr 23, 2026
1 check passed
@mihaibudiu mihaibudiu deleted the rank branch April 23, 2026 02:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DBSP core Related to the core DBSP library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants