{{ message }}
Added support for basic incremental index sorting#3656
Open
llalexandru00 wants to merge 2 commits into
Open
Conversation
katzyn
reviewed
Oct 15, 2022
katzyn
left a comment
Contributor
There was a problem hiding this comment.
Thank you for your contribution!
Please, send a license statement as described here
https://h2database.com/html/build.html#providing_patches
to our mailing list:
https://groups.google.com/g/h2-database
(It is partially pre-moderated, some new messages will be visible only after approval.)
…rt, fixed javadoc
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Performance bottleneck
I am using a lot of multi-table simple queries with
ORDER BY(and eventuallyOFFSET/LIMIT), which aren't optimized in any sense. All records are fetched, a final sorting is done and eventually anOFFSET/LIMITis applied. There is an existing optimization in H2 based on theSelect.sortUsingIndexflag, which hints that the records are already fetched in a sorted way (because an index is used), so a final sorting is not required. Also, LIMIT clause has a lot to benefit from this, as only some records are fetched. However, this works only for single-table queries.Example
The most basic example is something like
SELECT * FROM TEST1 CROSS JOIN TEST2 ORDER BY TEST1.F1 TEST2.F2 LIMIT 5. Right now, H2 sequentially scans both tables, sorts the result set according toORDER BY TEST1.F1 TEST2.F2and retrieves the first 5 rows. If both tables have ~1000 unique records, this operation takes ~450ms (embedded anonymous in-memory database v2.1.214).Research
LIMITis also optimized due to this technique.LIMITin the context of multi-tableORDER BY(LIMIT 1is considerably faster thanLIMIT 100000even for multi-table queries). Of course, these are C based back-ends, but the theoretic approach is relevant here.LIMITvalues. This may be misleading, as the final sorting performance difference may spoil the performance check.Solution
I suggest using the best b-tree index for the
topFilterTable- the one that matches the longest prefix of columns from theORDER BYclause as possible. Using such index will ensure that the result set is partially sorted on the firstSelect.alreadySortedelements fromORDER BY.When
OFFSET/LIMITis specified, onlyLIMITelements shall be retrieved initially. To ensure completeness, extra records should be fetched until a "bigger" record is encountered based on the firstSelect.alreadySortedfields. Sorting these and applyingLIMITis a complete and sound solution.Using something like
LIMIT Xwill makeSELECTto retrieve onlyXrecords and eventually fetches some extra records to ensure completeness.ORDER BYhas many fields matched by an index or the records have a high variance. Very few extra records are fetched (maybe none). In this case, the number of extra comparisons is low, so the overhead is clearly acceptable comparing to the fetching of all records.ORDER BYand these have a very low variance. Many extra records are fetched (maybe the whole result-set). In this case, the number of extra comparisons is high, but the number of fields for which one comparison is done is very low (1/2). Therefore, the overhead of applying this solution on result-sets with low variance is negligible.Result
The same SQL I provided earlier now runs in ~6 ms. The change never compromises as this does not interfere with join order or index selection. Only when it is safe to use/upgrade an index,
Select.alreadySortedis used.CROSS JOINqueries withORDER BYandLIMITare working considerably faster.ORDER BYare not affected by the change.ORDER BYandLIMITORDER BYandLIMIT, but always preserves the invariant proposed by the index in the first place.