Include support to limit rows in a group by operation by lppassos · Pull Request #3121 · h2database/h2database · GitHub
Skip to content

Include support to limit rows in a group by operation#3121

Draft
lppassos wants to merge 2 commits into
h2database:masterfrom
lppassos:master
Draft

Include support to limit rows in a group by operation#3121
lppassos wants to merge 2 commits into
h2database:masterfrom
lppassos:master

Conversation

@lppassos

Copy link
Copy Markdown

The limit is based on a system property "h2.maxGroupByEntries". The default value for the system property is -1, which keeps the current behaviour of the system.

If the value is changed to a positive number, it will limit the maximum number of different rows in the groupByData tree map in Grouped.

If someone issues a query that would create a table that is too big, only that query will fail. I've done this change to diminish the risk of OOM when running the embedded server. It doesn't solve it completely of course, because it doesn't actually consider the memory being used by the tree map, but given it's simplicity I think it could be a useful setting in some situations.

The limit is based on a system property.
@katzyn

katzyn commented May 18, 2021

Copy link
Copy Markdown
Contributor

@lppassos

Copy link
Copy Markdown
Author

Thanks for your comments.

I completely agree with you on what you say. The better solution would be like you say to use disk for large result sets, and number of entries is a poor metric for memory consumption. The only saving grace of this approach is that because its simplistic, it was also quite simple to do. It being my first contribution, I didn't want to go for something too complicated.

If you feel there is merit in pursuing this further, I'll work on improving the metric to be closer to the actual memory being used, while keeping the calculation simple and fast. I'll also include some unit tests for it.

Regarding the setting being marked experimental, can you give me some pointer on where that would be done? Is it just a matter of saying so in SysProperties?

@lppassos lppassos marked this pull request as draft May 18, 2021 07:33
@manticore-projects

manticore-projects commented May 18, 2021

Copy link
Copy Markdown
Contributor

Apologies for asking a maybe dumb question, I am just a curious visitor: would it not be a better idea to define per Connection Setting, whether to use a Memory Based Map ("regular systems") or a Disk based HashMap ("memory constraint embedded systems") for the Grouped By Entries? Something like https://github.com/bnyeggen/lash or maybe https://github.com/OpenHFT/Chronicle-Map, if MVSore was not fast enough?

I found this one: https://github.com/lmdbjava/benchmarks/blob/master/results/20160630/README.md
It is quite dated, but at least in 2016 MVStore was not the fastest solution and caching/backing large map to the disk might need that fastest speed possible?

@katzyn

katzyn commented May 18, 2021

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants