Comparing main...compiler-wtf-fix · react/react · GitHub
Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: react/react
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: main
Choose a base ref
...
head repository: react/react
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: compiler-wtf-fix
Choose a head ref
Checking mergeability… Don’t worry, you can still create the pull request.
  • 6 commits
  • 104 files changed
  • 1 contributor

Commits on Jun 11, 2026

  1. [rust-compiler] Add JsString WTF-8 type for JavaScript string values

    JsString is a string type that can represent JavaScript strings containing
    lone surrogates (U+D800-U+DFFF), which Rust's String (UTF-8) cannot.
    
    Internal representation uses an enum with a fast path:
    - Utf8(String): for the 99.99% of strings with no lone surrogates
    - Wtf8(Vec<u8>): WTF-8 encoded bytes when lone surrogates are present
    
    Custom serde support decodes __SURROGATE_XXXX__ markers (from bridge.ts)
    into WTF-8 lone surrogate bytes on deserialization, and re-encodes them
    as markers on serialization.
    
    Provides: Eq, Hash, Clone, Debug, Display, From<String>, From<&str>,
    utf16_len() for JS .length semantics, push_js_string() for concat,
    as_str() for zero-cost UTF-8 access, to_utf8_lossy() for display.
    
    21 unit tests covering: serde round-trip, encoding/decoding, equality,
    concatenation, utf16_len for BMP/supplementary/surrogate, sentinel
    collision resistance.
    
    This is Phase 1 of the WTF-8 migration. Subsequent phases will replace
    String with JsString in AST fields, HIR, and optimization passes.
    mvitousek committed Jun 11, 2026
    Configuration menu
    Copy the full SHA
    cef9a9e View commit details
    Browse the repository at this point in the history
  2. [rust-compiler] Phase 2: Replace String with JsString in AST types

    Change 5 AST fields from String to JsString:
    - StringLiteral.value
    - DirectiveLiteral.value
    - JSXText.value
    - TemplateElementValue.raw
    - TemplateElementValue.cooked
    
    These are the entry/exit points for JavaScript string content in the
    compiler. Downstream crates (lowering, optimization, codegen) will need
    corresponding changes in Phase 3.
    mvitousek committed Jun 11, 2026
    Configuration menu
    Copy the full SHA
    e22b932 View commit details
    Browse the repository at this point in the history
  3. [rust-compiler] Phase 3+4: Propagate JsString through HIR, lowering, …

    …codegen, and optimization
    
    Replace String with JsString across the entire compiler pipeline:
    
    HIR: PrimitiveValue::String(JsString), PropertyLiteral::String(JsString),
      InstructionValue::JSXText { value: JsString }, TemplateQuasi { raw/cooked: JsString },
      HirFunction.directives: Vec<JsString>
    
    Lowering: All StringLiteral→Primitive, JSXText, TemplateQuasi, ObjectPropertyKey,
      and directive extraction sites now use JsString.
    
    Codegen: All Primitive→StringLiteral, JSXText, TemplateQuasi, ObjectPropertyKey,
      directive, and string literal construction sites emit JsString.
    
    Optimization: String concat uses push_js_string(), .length uses utf16_len()
      (correctly returns 1 for lone surrogates instead of 18 for marker text),
      identifier checks use as_str() fast path.
    
    Debug printer: format_js_string() iterates WTF-8 bytes directly and emits
      \uXXXX for lone surrogates, matching TS's JSON.stringify output exactly.
    
    All tests pass:
      cargo build: clean
      cargo test: 30/30 (AST crate)
      yarn snap --rust: 1804/1804
      yarn snap: 1804/1804
      test-rust-port.sh: 1803/1803
    mvitousek committed Jun 11, 2026
    Configuration menu
    Copy the full SHA
    e97da84 View commit details
    Browse the repository at this point in the history
  4. [rust-compiler] Dynamic surrogate marker to eliminate sentinel collision

    Replace the hardcoded `__SURROGATE_` marker with a dynamically chosen
    prefix that's guaranteed absent from the source JSON. bridge.ts's
    chooseSurrogateMarker() starts with `__SURROGATE_` and prepends `__ESC_`
    until a collision-free prefix is found. The chosen marker is passed to
    Rust via `__surrogateMarker` in the options JSON.
    
    Rust side: set_surrogate_marker() stores the prefix in a thread-local.
    JsString's decode_markers/encode_markers read it dynamically instead of
    using the hardcoded prefix. This eliminates the sentinel collision bug
    where source code containing the literal text `__SURROGATE_D83E__` would
    have been corrupted.
    
    Verified: a fixture with literal `__SURROGATE_D83E__` in a string
    attribute survives compilation unchanged.
    
    All tests pass:
      cargo build + test: clean, 30/30
      yarn snap --rust: 1804/1804
      test-rust-port.sh: 1803/1803
    mvitousek committed Jun 11, 2026
    Configuration menu
    Copy the full SHA
    cb8cefb View commit details
    Browse the repository at this point in the history
  5. [rust-compiler] Update SWC/OXC crates for JsString compatibility

    Mechanical type changes in the SWC and OXC frontend crates to work with
    JsString instead of String for AST string fields (StringLiteral.value,
    DirectiveLiteral.value, JSXText.value, TemplateElementValue.raw/cooked).
    
    - convert_ast.rs: wrap String values in JsString::from() when constructing AST nodes
    - convert_ast_reverse.rs: use .to_string()/.as_str_unwrap() when extracting JsString values
    
    63 error sites fixed across 4 files.
    mvitousek committed Jun 11, 2026
    Configuration menu
    Copy the full SHA
    247e27d View commit details
    Browse the repository at this point in the history
  6. [rust-compiler] Null marker optimization: skip string scanning when n…

    …o surrogates present
    
    When bridge.ts doesn't encode any surrogates (99.99% of compilations),
    pass __surrogateMarker: null instead of a marker string. Rust's JsString
    deserialization checks the thread-local marker: if None, it skips all
    string scanning and stores every string as the Utf8 fast-path variant.
    
    This eliminates the overhead of searching every string for the marker
    prefix during deserialization for the common case where no lone
    surrogates exist in the source.
    
    Changes:
    - bridge.ts: sanitizeJsonSurrogates returns {json, hadSurrogates};
      passes null marker when no surrogates were found
    - js_string.rs: SURROGATE_MARKER thread-local changed from String to
      Option<String>; decode_markers returns None immediately when marker
      is None
    - lib.rs (NAPI): passes Option<&str> to set_surrogate_marker
    
    All tests pass: cargo test 105/105, yarn snap --rust 1804/1804.
    mvitousek committed Jun 11, 2026
    Configuration menu
    Copy the full SHA
    e41ed40 View commit details
    Browse the repository at this point in the history
Loading