Py new by youknowone · Pull Request #6402 · RustPython/RustPython · GitHub
Skip to content

Py new#6402

Merged
youknowone merged 2 commits intoRustPython:mainfrom
youknowone:py-new
Dec 11, 2025
Merged

Py new#6402
youknowone merged 2 commits intoRustPython:mainfrom
youknowone:py-new

Conversation

@youknowone
Copy link
Copy Markdown
Member

@youknowone youknowone commented Dec 11, 2025

Summary by CodeRabbit

  • Performance
    • Enhanced constructor implementation for built-in bytes, frozenset, string, and tuple types to improve object creation performance. Optimized code paths now handle exact type instances with direct returns, efficient singleton handling for empty collections, and streamlined construction flows that reduce overhead in common usage scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Dec 11, 2025

@youknowone youknowone marked this pull request as ready for review December 11, 2025 14:09
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5f496c9 and c5abc44.

📒 Files selected for processing (6)
  • crates/vm/src/builtins/bytes.rs (1 hunks)
  • crates/vm/src/builtins/set.rs (1 hunks)
  • crates/vm/src/builtins/str.rs (1 hunks)
  • crates/vm/src/builtins/tuple.rs (1 hunks)
  • crates/vm/src/bytes_inner.rs (1 hunks)
  • crates/vm/src/protocol/object.rs (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/builtins/set.rs
  • crates/vm/src/builtins/tuple.rs
  • crates/vm/src/builtins/bytes.rs
  • crates/vm/src/builtins/str.rs
  • crates/vm/src/bytes_inner.rs
  • crates/vm/src/protocol/object.rs
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T12:17:28.606Z
Learning: Applies to Lib/**/*.py : Add a `# TODO: RUSTPYTHON` comment when modifications are made to files in the `Lib/` directory
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In the RustPython codebase, only certain builtin types should be marked with the SEQUENCE flag for pattern matching. List and tuple are sequences, but bytes, bytearray, and range are not considered sequences in this context, even though they may implement sequence-like protocols.
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In RustPython's pattern matching implementation, only certain builtin types should have the SEQUENCE flag: list and tuple are confirmed sequences. The user youknowone indicated that bytes, bytearray are not considered sequences in this context, even though they implement sequence-like protocols.
📚 Learning: 2025-11-29T12:17:28.606Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T12:17:28.606Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust

Applied to files:

  • crates/vm/src/protocol/object.rs
🧬 Code graph analysis (4)
crates/vm/src/builtins/set.rs (5)
crates/vm/src/builtins/bytes.rs (2)
  • slot_new (97-149)
  • py_new (151-153)
crates/vm/src/builtins/tuple.rs (3)
  • slot_new (115-146)
  • input (122-122)
  • py_new (148-152)
crates/vm/src/builtins/complex.rs (1)
  • slot_new (154-167)
crates/vm/src/builtins/float.rs (1)
  • slot_new (134-147)
crates/vm/src/builtins/list.rs (1)
  • from_iter (47-49)
crates/vm/src/builtins/bytes.rs (3)
crates/vm/src/builtins/str.rs (5)
  • slot_new (353-366)
  • py_new (368-386)
  • elements (2118-2120)
  • elements (2229-2231)
  • elements (2351-2353)
crates/vm/src/builtins/tuple.rs (2)
  • slot_new (115-146)
  • py_new (148-152)
crates/vm/src/builtins/float.rs (1)
  • slot_new (134-147)
crates/vm/src/builtins/str.rs (1)
crates/vm/src/stdlib/typing.rs (1)
  • args (157-159)
crates/vm/src/protocol/object.rs (2)
crates/vm/src/builtins/bytes.rs (4)
  • from (44-48)
  • from (52-54)
  • obj (114-114)
  • slot_new (97-149)
crates/vm/src/builtins/object.rs (1)
  • slot_new (35-111)
🔇 Additional comments (9)
crates/vm/src/bytes_inner.rs (1)

3-7: LGTM - Import cleanup aligns with removed functionality.

The import changes (removing identifier, PyTypeRef, and simplifying pystr import) are consistent with the removal of the get_bytes() method that previously depended on these. The remaining imports support the get_bytearray_inner() method which is still used by the bytes constructor.

crates/vm/src/builtins/str.rs (2)

353-366: LGTM - Optimization for exact str type is correct.

The early return optimization correctly:

  1. Checks for exact str type (not subclasses)
  2. Ensures single positional argument with no kwargs
  3. Verifies the argument is already an exact str

This matches CPython's behavior and avoids unnecessary string copying.


368-386: LGTM - Clean separation of payload creation logic.

The py_new helper correctly handles all construction cases:

  • Decoding with encoding parameter
  • Converting via str() without encoding
  • Empty string for missing input

The separation from slot_new improves code clarity and follows the pattern established in other builtins.

crates/vm/src/builtins/set.rs (1)

921-958: LGTM - Constructor refactor follows the established pattern.

The implementation correctly:

  1. Returns exact frozenset input as-is (line 929-933)
  2. Returns empty frozenset singleton for missing iterable (line 936-938)
  3. Returns empty frozenset singleton for empty result (line 947-950)
  4. Delegates payload creation to py_new which uses from_iter

The type Args = Vec<PyObjectRef> declaration aligns with py_new's signature, while slot_new handles the conversion from FuncArgs to the expected elements vector.

crates/vm/src/protocol/object.rs (2)

7-16: LGTM - Import updates align with new construction pathway.

The import changes correctly add:

  • FuncArgs for constructing function arguments
  • Constructor trait for invoking the slot_new pathway

These support the refactored bytes() method below.


34-43: LGTM - Cleaner construction pathway via Constructor trait.

The refactored bytes() method correctly:

  1. Maintains the PyInt error handling (bytes shouldn't accept int directly via this path)
  2. Creates FuncArgs from the single object argument
  3. Delegates to PyBytes::slot_new which handles all construction logic including __bytes__ method invocation and optimizations

This centralizes the bytes construction logic and eliminates the now-removed get_bytes() pathway.

crates/vm/src/builtins/tuple.rs (1)

112-152: LGTM - Constructor follows the unified pattern across builtins.

The implementation correctly handles all optimization cases:

  1. Returns exact input tuple as-is (lines 121-125)
  2. Returns empty tuple singleton for missing iterable (lines 128-130)
  3. Returns empty tuple singleton for empty result (lines 139-142)
  4. Properly separates slot binding (slot_new) from payload creation (py_new)

The type Args = Vec<PyObjectRef> with py_new accepting elements: Self::Args provides a clean interface while slot_new handles the FuncArgs to Vec conversion.

crates/vm/src/builtins/bytes.rs (2)

94-118: Constructor optimizations look correct and consistent with other builtin types.

The optimization pattern correctly mirrors other builtin constructors (tuple, str, float):

  • Empty singleton returned for exact type with no arguments
  • Exact input bytes returned as-is when no encoding/errors specified
  • Use of downcast_exact properly ensures only exact PyBytes instances bypass construction

139-153: Fallback path and simplified py_new look good.

The structure cleanly separates concerns:

  • slot_new handles all optimization logic and protocol dispatch
  • py_new is reduced to simple construction from Vec<u8>
  • Empty singleton optimization in the fallback path matches the pattern in tuple.rs

Comment on lines +120 to +137
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Find the PyBytesInner::try_from_borrowed_object implementation
rg -A 20 "fn try_from_borrowed_object" crates/vm/src/builtins/bytes.rs

Repository: RustPython/RustPython

Length of output: 238


🏁 Script executed:

# Search for __bytes__ related tests or comments
rg -B 3 -A 3 "__bytes__" crates/vm/src/builtins/bytes.rs

Repository: RustPython/RustPython

Length of output: 856


🏁 Script executed:

# Check if there are any tests for bytes constructor with __bytes__
fd -e rs -x rg -l "__bytes__" {} \; | head -20

Repository: RustPython/RustPython

Length of output: 222


🏁 Script executed:

# Look at the downcast usage pattern in this file
rg "downcast" crates/vm/src/builtins/bytes.rs

Repository: RustPython/RustPython

Length of output: 794


🏁 Script executed:

# Find PyBytesInner definition and try_from_borrowed_object impl
rg -B 5 -A 20 "impl.*PyBytesInner" crates/vm/src/builtins/bytes.rs | head -80

Repository: RustPython/RustPython

Length of output: 527


🏁 Script executed:

# Search for PyBytesInner enum/struct definition
rg "^(pub )?struct PyBytesInner|^(pub )?enum PyBytesInner" crates/vm/src/builtins/bytes.rs

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Find where PyBytesInner::try_from_borrowed_object is actually implemented
ast-grep --pattern 'impl PyBytesInner {
  $$$
  fn try_from_borrowed_object($$$) {
    $$$
  }
  $$$
}'

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Look for the actual implementation - search more broadly
rg "fn try_from_borrowed_object" crates/vm/src/builtins/bytes.rs -B 2 -A 30 | head -100

Repository: RustPython/RustPython

Length of output: 288


🏁 Script executed:

# Look at the beginning/middle of bytes.rs for PyBytesInner impl
head -300 crates/vm/src/builtins/bytes.rs | tail -200

Repository: RustPython/RustPython

Length of output: 5642


🏁 Script executed:

# Search for PyBytesInner impl with try_from_borrowed_object more carefully
rg -B 3 "PyBytesInner::try_from_borrowed_object" crates/vm/src/builtins/bytes.rs

Repository: RustPython/RustPython

Length of output: 812


🏁 Script executed:

# Look at the entire impl block for PyBytesInner
sed -n '/^impl PyBytesInner/,/^impl /p' crates/vm/src/builtins/bytes.rs | head -100

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Search across entire vm/src for PyBytesInner::try_from_borrowed_object implementation
rg -B 2 -A 20 "impl.*PyBytesInner.*\{" crates/vm/src/builtins/bytes.rs | grep -A 25 "try_from_borrowed_object"

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# PyBytesInner might be defined as a type alias - search for its definition
rg "type PyBytesInner|struct PyBytesInner|enum PyBytesInner" crates/vm/src/

Repository: RustPython/RustPython

Length of output: 119


🏁 Script executed:

# Search in function/buffer.rs since learnings mention it
rg -B 2 -A 25 "try_from_borrowed_object" crates/vm/src/function/buffer.rs | head -80

Repository: RustPython/RustPython

Length of output: 2557


🏁 Script executed:

# Look at PyBytesInner implementation in bytes_inner.rs
rg -B 2 -A 25 "fn try_from_borrowed_object" crates/vm/src/bytes_inner.rs

Repository: RustPython/RustPython

Length of output: 881


🏁 Script executed:

# Get context around PyBytesInner impl
rg -B 5 -A 30 "impl PyBytesInner" crates/vm/src/bytes_inner.rs | head -100

Repository: RustPython/RustPython

Length of output: 1354


🏁 Script executed:

# Find bytes_from_object function
rg -B 3 -A 40 "fn bytes_from_object" crates/vm/src/bytes_inner.rs

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Search entire codebase for bytes_from_object
rg -B 3 -A 40 "fn bytes_from_object"

Repository: RustPython/RustPython

Length of output: 1345


🏁 Script executed:

# Check if it's imported in bytes_inner.rs
rg "bytes_from_object" crates/vm/src/bytes_inner.rs

Repository: RustPython/RustPython

Length of output: 201


🌐 Web query:

CPython bytes constructor __bytes__ return type error message TypeError

💡 Result:

When you call bytes(x), CPython will call x.bytes() (if present) and that method must return a bytes object. If it returns some other type, CPython raises a TypeError such as:

  • "TypeError: bytes returned non-bytes (type X)" (e.g. X == int or str). [1][2]

Fix: make bytes return a bytes object (e.g. return b"..." or return some_str.encode('utf-8')). [3]

Sources:

  • object.bytes doc / pydoc requirement that bytes should return bytes. [1]
  • Python tracker / example error message showing "bytes returned non-bytes (type ...)". [2]
  • Explanation and example usage showing the TypeError and correct return type. [3]

🏁 Script executed:

# Search for tests of __bytes__ returning non-bytes
rg -B 5 -A 10 "__bytes__.*non" crates/

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Look for __bytes__ error tests
fd -e py -x rg -l "__bytes__" {} \; | xargs rg -l "TypeError" | head -5

Repository: RustPython/RustPython

Length of output: 731


🏁 Script executed:

# Check the test file for __bytes__ tests
cat extra_tests/snippets/builtin_bytes.py

Repository: RustPython/RustPython

Length of output: 20952


🏁 Script executed:

# Look at lines 120-137 more carefully with context
sed -n '115,145p' crates/vm/src/builtins/bytes.rs

Repository: RustPython/RustPython

Length of output: 1364


🏁 Script executed:

# Check if there's specific __bytes__ error validation
rg -B 5 -A 5 "returned non-bytes" crates/

Repository: RustPython/RustPython

Length of output: 47


The downcast usage on line 129 is correct; consider adding a specific error for __bytes__ returning non-bytes.

Line 129 correctly uses downcast::<PyBytes>() (which accepts subclasses) rather than downcast_exact. This matches CPython semantics: bytes() returns the __bytes__ result directly if it's any bytes instance, including subclasses.

However, when __bytes__ returns a non-bytes type, the current code delegates to PyBytesInner::try_from_borrowed_object, which raises a generic error: "can assign only bytes, buffers, or iterables of ints in range(0, 256)". CPython raises a more specific error: "TypeError: bytes returned non-bytes (type X)". Consider adding explicit validation to match CPython's error message.

@youknowone youknowone merged commit a477835 into RustPython:main Dec 11, 2025
32 of 35 checks passed
@youknowone youknowone deleted the py-new branch December 11, 2025 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant