GH-101362: Optimise pathlib by deferring path normalisation by barneygale · Pull Request #101560 · python/cpython · GitHub
Skip to content

GH-101362: Optimise pathlib by deferring path normalisation#101560

Closed
barneygale wants to merge 16 commits intopython:mainfrom
barneygale:optimize-pathlib-part-2b
Closed

GH-101362: Optimise pathlib by deferring path normalisation#101560
barneygale wants to merge 16 commits intopython:mainfrom
barneygale:optimize-pathlib-part-2b

Conversation

@barneygale
Copy link
Copy Markdown
Contributor

@barneygale barneygale commented Feb 4, 2023

PurePath now normalises and splits paths only when necessary, e.g. when .name or .parent is accessed. The result is cached. This speeds up path object construction by around 4x.

PurePath.__fspath__() now returns an unnormalised path, which should be transparent to filesystem APIs (else pathlib's normalisation is broken!). This extends the earlier performance improvement to most impure Path methods, and also speeds up p.joinpath('bar') and p / 'bar'. edit: will fix separately.

This also fixes GH-76846 and GH-85281 by unifying path constructors and adding an __init__() method. edit: will fix separately.

`PurePath` now normalises and splits paths only when necessary, e.g. when
`.name` or `.parent` is accessed. The result is cached. This speeds up path
object construction by around 4x.

`PurePath.__fspath__()` now returns an unnormalised path, which should be
transparent to filesystem APIs (else pathlib's normalisation is broken!).
This extends the earlier performance improvement to most impure `Path`
methods, and also speeds up pickling, `p.joinpath('bar')` and `p / 'bar'`.

This also fixes pythonGH-76846 and pythonGH-85281 by unifying path constructors and
adding an `__init__()` method.
@barneygale
Copy link
Copy Markdown
Contributor Author

barneygale commented Feb 4, 2023

@barneygale barneygale marked this pull request as ready for review February 4, 2023 18:28
@barneygale barneygale marked this pull request as draft February 7, 2023 20:35
@barneygale
Copy link
Copy Markdown
Contributor Author

I've found a couple other small optimizations which are best tackled in other PRs, so I'm marking this PR as a 'draft' for now.

@barneygale barneygale changed the title GH-101362 - Optimise pathlib by deferring path normalisation GH-101362: Optimise pathlib by deferring path normalisation Mar 6, 2023
@AlexWaygood AlexWaygood added the performance Performance or resource usage label Mar 6, 2023
@barneygale
Copy link
Copy Markdown
Contributor Author

I've undone the change to _from_parsed_parts(), which has restored directory-walking performance:

$ ./python -m timeit -n 20 -s 'from pathlib import Path' 'list(Path().rglob("*"))' 
20 loops, best of 5: 146 msec per loop  # before
20 loops, best of 5: 152 msec per loop  # after

Still a tiny bit slower than pre-PR.

The rest of the speedups/slowdowns mentioned in my previous comment are still there.

@barneygale barneygale marked this pull request as ready for review March 6, 2023 02:33
@barneygale
Copy link
Copy Markdown
Contributor Author

The change to importlib is necessary because it's relying on a bug in pathlib's path normalization:

I think I need to solve that issue first, so I'm going to mark this PR as a draft (again!)

@barneygale barneygale marked this pull request as draft March 11, 2023 23:32
@barneygale barneygale marked this pull request as ready for review March 17, 2023 16:20
@barneygale barneygale marked this pull request as draft March 17, 2023 16:45
@barneygale
Copy link
Copy Markdown
Contributor Author

@barneygale barneygale closed this Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pathlib.Path._from_parsed_parts should call cls.__new__(cls)

3 participants