Fix jsonable_encoder crash on invalid UTF‑8 bytes by SAURABHSALVE · Pull Request #15641 · fastapi/fastapi · GitHub
Skip to content

Fix jsonable_encoder crash on invalid UTF‑8 bytes#15641

Open
SAURABHSALVE wants to merge 1 commit into
fastapi:masterfrom
SAURABHSALVE:fix/jsonable-encoder-bytes-decode
Open

Fix jsonable_encoder crash on invalid UTF‑8 bytes#15641
SAURABHSALVE wants to merge 1 commit into
fastapi:masterfrom
SAURABHSALVE:fix/jsonable-encoder-bytes-decode

Conversation

@SAURABHSALVE

Copy link
Copy Markdown

jsonable_encoder currently decodes bytes using the default UTF‑8 settings, which can raise UnicodeDecodeError for arbitrary binary payloads.

This PR makes bytes encoding more robust by decoding with errors="replace", so encoding never crashes and instead produces a safe, JSON-serializable string.

Update bytes encoder to o.decode(errors="replace")
Add a regression test covering b"\xff" (invalid UTF‑8) to ensure it doesn’t raise and encodes to "\ufffd"
No API changes intended—this just prevents an unexpected exception during encoding.

Co-authored-by: Cursor <cursoragent@cursor.com>
@codspeed-hq

codspeed-hq Bot commented May 29, 2026

Copy link
Copy Markdown

@SAURABHSALVE

Copy link
Copy Markdown
Author

@t0ugh-sys t0ugh-sys left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Fix jsonable_encoder crash on invalid UTF-8 bytes

Summary

This PR addresses a real issue — jsonable_encoder raising UnicodeDecodeError when encoding bytes values containing invalid UTF-8. The fix changes the bytes encoder from o.decode() to o.decode(errors="replace").

Observations

What's good:

  • The bug is real and the fix is straightforward. Passing arbitrary binary payloads through jsonable_encoder is a legitimate use case, and an unhandled UnicodeDecodeError is surprising behavior.
  • A regression test is included, which is great. The test uses b"\xff" (a common invalid UTF-8 byte) and asserts the replacement character \ufffd appears in the output.

Potential concerns and suggestions:

  1. Lossy encoding is a design choice. Using errors="replace" silently replaces invalid bytes with (\ufffd). This is safe in the sense that it never crashes, but it also means the round-trip is lossy — you cannot reconstruct the original bytes from the encoded string. An alternative approach like base64 encoding (base64.b64encode(o).decode()) would be lossless, but that would be a larger behavioral change. The "replace" strategy is reasonable as a minimal fix, but it's worth being aware that this could mask data corruption in downstream usage.

  2. Documentation. It would be good to note this behavior somewhere (or at least in the changelog/release notes) so users know that invalid UTF-8 bytes are silently replaced rather than raising an error. Previously, a UnicodeDecodeError at least signaled something was wrong.

  3. Test coverage. The test is minimal but adequate for the regression case. You might also consider testing a mixed string like b"hello\xffworld" to verify that valid bytes surrounding invalid ones are preserved correctly (e.g., "hello\ufffdworld").

  4. No API surface change. The PR correctly notes this is not an API change — only the error handling behavior changes. This is a good, focused fix.

Verdict

This is a reasonable and minimal fix. The "replace" strategy is the least-invasive option and prevents unexpected crashes. LGTM with the minor suggestion to add one more test case for mixed valid/invalid byte sequences.

@YuriiMotov YuriiMotov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it should be default behavior (it's lossy), but probably we can go this way.

Anyway, this behavior should be documented clearly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants