GH-101362: Optimise PurePath(PurePath(...)) by barneygale · Pull Request #101667 · python/cpython · GitHub
Skip to content

GH-101362: Optimise PurePath(PurePath(...))#101667

Merged
miss-islington merged 8 commits intopython:mainfrom
barneygale:gh-101362-use-native-pathlib-joining
Mar 5, 2023
Merged

GH-101362: Optimise PurePath(PurePath(...))#101667
miss-islington merged 8 commits intopython:mainfrom
barneygale:gh-101362-use-native-pathlib-joining

Conversation

@barneygale
Copy link
Copy Markdown
Contributor

@barneygale barneygale commented Feb 7, 2023

The previous _parse_args() method pulled the _parts out of any supplied PurePath objects; these were subsequently joined in _from_parts() using os.path.join(). This is actually a slower form of joining than calling fspath() on the path object, because it doesn't take advantage of the fact that the contents of _parts is normalized!

This reduces the time taken to run PurePath("foo", "bar") by ~20%, and the time taken to run PurePath(p, "cheese"), where p = PurePath("/foo", "bar", "baz"), by ~40%.

Automerge-Triggered-By: GH:AlexWaygood

The previous `_parse_args()` method pulled the `_parts` out of any
supplied `PurePath` objects; these were subsequently joined in
`_from_parts()` using `os.path.join()`. This is actually a slower form of
joining than calling `fspath()` on the path object, because it doesn't take
advantage of the fact that the contents of `_parts` is normalized!

This reduces the time taken to run `PurePath("foo", "bar") by ~20%, and
the time taken to run `PurePath(p, "cheese")`, where
`p = PurePath("/foo", "bar", "baz")`, by ~40%.
@barneygale
Copy link
Copy Markdown
Contributor Author

barneygale commented Feb 25, 2023

@zooba
Copy link
Copy Markdown
Member

zooba commented Feb 27, 2023

I guess my only question now is whether join handles differing types the same way as parse_args used to? Specifically, each element is converted to a string before being joined, rather than failing with TypeError.

If it works the same, then I have no concerns.

@barneygale
Copy link
Copy Markdown
Contributor Author

os.path.join() requires that you don't mix str and bytes arguments (after fspath()) - it raises TypeError if you do. But it supports subclasses of str and bytes and won't convert them to true strings/bytes.

pathlib requires that every argument is a str (after fspath()). It also supports subclasses of str, and will convert them to true strings.

I think this patch maintains all previous behaviour. You still get a TypeError if any argument you supply, after fspath(), isn't a subclass of str.

@barneygale
Copy link
Copy Markdown
Contributor Author

Actually, there don't seem to be good tests for this. I'll write some!

@zooba
Copy link
Copy Markdown
Member

zooba commented Feb 28, 2023

LGTM!

Copy link
Copy Markdown
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! A few minor suggestions

Comment thread Doc/library/pathlib.rst Outdated
Comment thread Lib/pathlib.py Outdated
Comment thread Lib/test/test_pathlib.py Outdated
Comment thread Misc/NEWS.d/next/Library/2023-02-07-22-20-32.gh-issue-101362.Jlk6mt.rst Outdated
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Comment thread Misc/NEWS.d/next/Library/2023-02-07-22-20-32.gh-issue-101362.Jlk6mt.rst Outdated
Comment thread Lib/pathlib.py Outdated
Copy link
Copy Markdown
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@barneygale
Copy link
Copy Markdown
Contributor Author

Thanks so much, Alex! \o/

@AlexWaygood AlexWaygood added the performance Performance or resource usage label Mar 5, 2023
@miss-islington
Copy link
Copy Markdown
Contributor

@miss-islington miss-islington merged commit 6716254 into python:main Mar 5, 2023
hugovk pushed a commit to hugovk/cpython that referenced this pull request Mar 6, 2023
The previous `_parse_args()` method pulled the `_parts` out of any supplied `PurePath` objects; these were subsequently joined in `_from_parts()` using `os.path.join()`. This is actually a slower form of joining than calling `fspath()` on the path object, because it doesn't take advantage of the fact that the contents of `_parts` is normalized!

This reduces the time taken to run `PurePath("foo", "bar")` by ~20%, and the time taken to run `PurePath(p, "cheese")`, where `p = PurePath("/foo", "bar", "baz")`, by ~40%.

Automerge-Triggered-By: GH:AlexWaygood
carljm added a commit to carljm/cpython that referenced this pull request Mar 6, 2023
* main: (21 commits)
  pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives in sub interpreters module (python#102472)
  pythongh-95672: Fix versionadded indentation of get_pagesize in test.rst (pythongh-102455)
  pythongh-102416: Do not memoize incorrectly loop rules in the parser (python#102467)
  pythonGH-101362: Optimise PurePath(PurePath(...)) (pythonGH-101667)
  pythonGH-101362: Check pathlib.Path flavour compatibility at import time (pythonGH-101664)
  pythonGH-101362: Call join() only when >1 argument supplied to pathlib.PurePath() (python#101665)
  pythongh-102444: Fix minor bugs in `test_typing` highlighted by pyflakes (python#102445)
  pythonGH-102341: Improve the test function for pow (python#102342)
  Fix unused classes in a typing test (pythonGH-102437)
  pythongh-101979: argparse: fix a bug where parentheses in metavar argument of add_argument() were dropped (python#102318)
  pythongh-102356: Add thrashcan macros to filter object dealloc (python#102426)
  Move around example in to_bytes() to avoid confusion (python#101595)
  pythonGH-97546: fix flaky asyncio `test_wait_for_race_condition` test (python#102421)
  pythongh-96821: Add config option `--with-strict-overflow` (python#96823)
  pythongh-101992: update pstlib module documentation (python#102133)
  pythongh-63301: Set exit code when tabnanny CLI exits on error (python#7699)
  pythongh-101863: Fix wrong comments in EUC-KR codec (pythongh-102417)
  pythongh-102302 Micro-optimize `inspect.Parameter.__hash__` (python#102303)
  pythongh-102179: Fix `os.dup2` error reporting for negative fds (python#102180)
  pythongh-101892: Fix `SystemError` when a callable iterator call exhausts the iterator (python#101896)
  ...
carljm added a commit to carljm/cpython that referenced this pull request Mar 7, 2023
* main: (37 commits)
  pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives in sub interpreters module (python#102472)
  pythongh-95672: Fix versionadded indentation of get_pagesize in test.rst (pythongh-102455)
  pythongh-102416: Do not memoize incorrectly loop rules in the parser (python#102467)
  pythonGH-101362: Optimise PurePath(PurePath(...)) (pythonGH-101667)
  pythonGH-101362: Check pathlib.Path flavour compatibility at import time (pythonGH-101664)
  pythonGH-101362: Call join() only when >1 argument supplied to pathlib.PurePath() (python#101665)
  pythongh-102444: Fix minor bugs in `test_typing` highlighted by pyflakes (python#102445)
  pythonGH-102341: Improve the test function for pow (python#102342)
  Fix unused classes in a typing test (pythonGH-102437)
  pythongh-101979: argparse: fix a bug where parentheses in metavar argument of add_argument() were dropped (python#102318)
  pythongh-102356: Add thrashcan macros to filter object dealloc (python#102426)
  Move around example in to_bytes() to avoid confusion (python#101595)
  pythonGH-97546: fix flaky asyncio `test_wait_for_race_condition` test (python#102421)
  pythongh-96821: Add config option `--with-strict-overflow` (python#96823)
  pythongh-101992: update pstlib module documentation (python#102133)
  pythongh-63301: Set exit code when tabnanny CLI exits on error (python#7699)
  pythongh-101863: Fix wrong comments in EUC-KR codec (pythongh-102417)
  pythongh-102302 Micro-optimize `inspect.Parameter.__hash__` (python#102303)
  pythongh-102179: Fix `os.dup2` error reporting for negative fds (python#102180)
  pythongh-101892: Fix `SystemError` when a callable iterator call exhausts the iterator (python#101896)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance or resource usage topic-pathlib

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants