fix(stmt-html): Fix embedded Buffer processing performance issue.#8748
Conversation
742b880 to
58cf410
Compare
There was a problem hiding this comment.
I would say something more realistic like "static LUTs" here... model weights are so large, they should be ImageParams.
rtzam
left a comment
There was a problem hiding this comment.
Just tested locally on the big LLM test case. The speedup is from "practically non-terminating" to ~40 seconds for a 1 layer model. Huge improvements! Thanks for the change.
| } | ||
| } | ||
|
|
||
| start = end + 1; |
There was a problem hiding this comment.
The fact that start can ever point past the end of the string_view made me uneasy. I can't wait for std::views::split to be available (c++23)
There was a problem hiding this comment.
Here's my take on a C++23 version, for posterity.
void generate(std::string_view code, std::map<uint64_t, std::regex>& markers,
std::map<uint64_t, int>& lnos) {
static constexpr std::string_view marker_prefix = "%\"";
std::size_t lno = 1;
for (auto&& chunk : std::views::split(code, '\n')) {
std::string_view line(chunk.begin(), chunk.end());
if (line.contains(marker_prefix)) {
std::erase_if(markers, [&](const auto& kv) {
const auto& [node, regex] = kv;
if (std::regex_search(line.begin(), line.end(), regex)) {
lnos[node] = lno;
return true;
}
return false;
});
}
lno++;
}
}There was a problem hiding this comment.
Well the loop condition should put you at ease. But indeed, a well-tested standard function would be nice.
There was a problem hiding this comment.
Well the loop condition should put you at ease.
It does 🙂
Do you know where the remaining 40 seconds are spent? More stuff to blame in StmtToHTML? I suspect loading the asm file can be slow (in |
|
So I've re-profiled and the remaining runtime is:
|
|
Duplicating the IR refers the Module clone, right? So nothing I want to do there. 20% of 40s is still 8s. Eight seconds for loading a text file... Let me try to optimize this either way. This is still a bit too ridiculous to my taste 😝 |
… HTML. Move-optimized replace_all: allow for no-copy execution when target string is not found.
I pushed another patch for loading faster. This should pretty much eliminate the loading time. Curious to see what the result is if you feel like profiling it again. 😄 |
|
I'm measuring a difference in the generation of HTML (~10% faster). Nice!! |

Initial changes to address #8717.
This does not address the Module deep-copy behavior copying Buffer contents. This merely addresses the string processing and avoids some copies.
Fixes #8752
Drive-by optimization: Change the signature of replace_all() to take the subject-string to be passed by value. This now allows for making replace-chains in this pattern:
Without making copies in case the target string is not found in the subject string.