Windows profiling backend by roblabla · Pull Request #2694 · jemalloc/jemalloc · GitHub
Skip to content

Windows profiling backend#2694

Open
roblabla wants to merge 4 commits intojemalloc:devfrom
roblabla:msvc-profiling
Open

Windows profiling backend#2694
roblabla wants to merge 4 commits intojemalloc:devfrom
roblabla:msvc-profiling

Conversation

@roblabla
Copy link
Copy Markdown
Contributor

Adds a new profiling backend, using CaptureStackBackTrace to grab the stacktrace.

The backend doesn't currently grab the memory mapping of libraries, but is otherwise perfectly functional. I'm currently using it in conjunction with rust-jemalloc-pprof, to great effect.

Adds a new profiling backend that uses the CaptureStackBackTrace
function for backtraces. The backend does not currently give the library
mapping, but is otherwise functional.
Under windows, on the MSVC runtime, the creat function will raise an
assertion if it is called with any flag than S_IREAD and S_IWRITE. To
avoid crashing when writing the profiling data, we will mask out any
unsupported flags from the provided mode.
@interwq
Copy link
Copy Markdown
Contributor

interwq commented Aug 26, 2024

@nullptr0-0
Copy link
Copy Markdown
Contributor

nullptr0-0 commented Aug 29, 2024

Is there a sample output when the config is enabled?

The backend doesn't currently grab the memory mapping of libraries.

I think it's because the part of dump mapping is not supported for windows. Ideally, we should have the full flow implemented as a whole if we want to enabled profiling on windows.

@roblabla
Copy link
Copy Markdown
Contributor Author

Is there a sample output when the config is enabled?

I'll be able to provide one in a couple weeks (currently traveling).

I think it's because the part of dump mapping is not supported for windows.

Yes. I left it unimplemented because I didn't personally need it (rust-jemalloc-pprof does its own listing of the mappings instead of reusing jemalloc's).

Implementing it shouldn't be too hard though - it's mostly calling EnumProcessModules to enumerate on the loaded modules, and GetModuleFileNameExW/GetModuleInformation to get their name and load address.

@roblabla
Copy link
Copy Markdown
Contributor Author

roblabla commented Sep 13, 2024

Is there a sample output when the config is enabled?

Here's a sample output:

heap_v2/524288
  t*: 1: 4194308 [0: 0]
  t0: 1: 4194308 [0: 0]
  t1: 0: 0 [0: 0]
@ 0x7ff73663657b 0x7ff736636522 0x7ff7365cf230 0x7ff736429e02 0x7ff7363c5fef 0x7ff7363e5f42 0x7ff7364a9ea7 0x7ff7364aaa66 0x7ff7364aa48f 0x7ff7363c2fc1 0x7ff7363c3f07 0x7ff7363de3ae 0x7ff7363b2a30 0x7ff7363bce29 0x7ff7363bcc3e 0x7ff7363d4893 0x7ff7363e6fb9 0x7ff7363b9808 0x7ff7363e6f74 0x7ff7363a110f 0x7ff7363a11f7 0x7ff7363e6076 0x7ff7363d5295 0x7ff7363e632e 0x7ff7363b3531 0x7ff73655ee9e 0x7ff73655b084 0x7ff7363b350a 0x7ff7363e6159 0x7ff736645c48 0x7ff95a666fd4 0x7ff95c4fcf31
  t*: 1: 4194308 [0: 0]
  t0: 1: 4194308 [0: 0]

Here's the program source: https://github.com/roblabla/jemalloc-windows-example

And the .exe/pdb can be found here: rust-jemalloc-pprof-example.zip


Something of note: MacOS has a similar problem to Windows, where it is supported by the profiling backend, but does not output the library mapping. I suppose this is more "accidental" than anything (macos and linux share the same code for acquiring backtraces, so macos got it "for free" - but the code to acquire mappings is linux-specific, using /proc/pid/maps). Turns out an implementation of macos mapping acquisition was committed in December on master.


I'll start work on the Windows backend to get the mapping.

@interwq
Copy link
Copy Markdown
Contributor

interwq commented Sep 13, 2024

@roblabla Thank you for the PRs and pushing for the improvements on Windows!

LONG is the name of a type in the Windows SDK, so the macro would
conflict with that.
@roblabla
Copy link
Copy Markdown
Contributor Author

roblabla commented Sep 13, 2024

Pushed an implementation using CreateToolhelp32Snapshot/Module32First/Module32Next, which has the advantage of not requiring allocations (all the allocations are done internally by the API, using the system allocator instead of jemalloc).

Here's an example output

heap_v2/524288
  t*: 1: 4194308 [0: 0]
  t0: 1: 4194308 [0: 0]
  t1: 0: 0 [0: 0]
  t2: 0: 0 [0: 0]
@ 0x7ff6208da66b 0x7ff6208da612 0x7ff6208540a0 0x7ff6206c8ac2 0x7ff6206705ad 0x7ff62067cb02 0x7ff6206a58fa 0x7ff6206a5d76 0x7ff6206aa11f 0x7ff62065ee15 0x7ff62065fc17 0x7ff62066a66e 0x7ff62064b1b0 0x7ff620642c79 0x7ff62064276e 0x7ff6206699b3 0x7ff6206837c9 0x7ff620671308 0x7ff620683784 0x7ff62068205f 0x7ff620682147 0x7ff62067cc36 0x7ff620660bf5 0x7ff620642eee 0x7ff620683851 0x7ff6207fd4fd 0x7ff62068382a 0x7ff62067cd19 0x7ff6208e6c18 0x7ff95a666fd4 0x7ff95c4fcf31
  t*: 1: 4194308 [0: 0]
  t0: 1: 4194308 [0: 0]

MAPPED_LIBRARIES:
00007ff620640000-00007ff620a5c000: C:\Users\root\agent-rust\jemalloc-example\rust-jemalloc-pprof-example.exe
00007ff95c4b0000-00007ff95c6a4000: C:\Windows\SYSTEM32\ntdll.dll
00007ff95a650000-00007ff95a70d000: C:\Windows\System32\KERNEL32.DLL
00007ff95a010000-00007ff95a2d7000: C:\Windows\System32\KERNELBASE.dll
00007ff95aac0000-00007ff95ab2b000: C:\Windows\System32\ws2_32.dll
00007ff95a990000-00007ff95aab3000: C:\Windows\System32\RPCRT4.dll
00007ff95a7c0000-00007ff95a7c8000: C:\Windows\System32\PSAPI.DLL
00007ff95c2a0000-00007ff95c34a000: C:\Windows\System32\ADVAPI32.dll
00007ff95bfc0000-00007ff95c05e000: C:\Windows\System32\msvcrt.dll
00007ff95c200000-00007ff95c29b000: C:\Windows\System32\sechost.dll
00007ff9594f0000-00007ff9594fc000: C:\Windows\SYSTEM32\CRYPTBASE.DLL
00007ff95a390000-00007ff95a40f000: C:\Windows\System32\bcryptPrimitives.dll
00007ff959320000-00007ff95938a000: C:\Windows\system32\mswsock.dll

I also had to rename the LONG macro to SIZEOF_LONG, since LONG is a common datatype typedef in windows, causing conflicts.

@roblabla roblabla force-pushed the msvc-profiling branch 3 times, most recently from fa17a65 to 1720e4f Compare September 13, 2024 20:41
@interwq
Copy link
Copy Markdown
Contributor

interwq commented Oct 2, 2024

@nullptr0-0 can you help to take another look?

Comment thread src/prof_sys.c

static int
prof_dump_open_file_impl(const char *filename, int mode) {
#ifdef _MSC_VER
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between _MSC_VER and _WIN32, should we use the latter to be consistent with the codebase?

Copy link
Copy Markdown
Contributor Author

@roblabla roblabla Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • _MSC_VER contains the MSVC compiler version (or, in the case of clang-cl, the version of MSVC that version of clang-cl targets).
  • _WIN32 is set to 1 when targeting x86/x64/arm32/arm64 with either a MSVC compiler, clang-cl, or MinGW's GCC.

Using _WIN32 is probably more correct here? I'm not sure if MinGW has its own implementation of creat, or if it reuses the broken one from the CRT.

Copy link
Copy Markdown
Contributor

@nullptr0-0 nullptr0-0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also is the above output consumed properly by jeprof on Windows? I wonder if the jeprof script also needs to be updated in order to fully support the feature?

Comment thread src/prof_log.c
} else {
fd = creat(log_filename, 0644);
int mode = 0644;
#ifdef _MSC_VER
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as in prof_sys.c

Comment thread src/prof_sys.c
buf_writer_cb(buf_writer, buffer);
} while (Module32Next(snapshot, &module) == TRUE);

label_error:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

label_done instead of error?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this macro is used internally, @interwq any recommendation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants