stream: fix UTF-8 character corruption in fast-utf8-stream · nodejs/node@a5b1be2 · GitHub
Skip to content

Commit a5b1be2

Browse files
mcollinaaduh95
authored andcommitted
stream: fix UTF-8 character corruption in fast-utf8-stream
Fix releaseWritingBuf() to correctly handle partial writes that split multi-byte UTF-8 characters. The previous implementation incorrectly converted byte counts to character counts, causing: - 3-byte characters (CJK) to be silently dropped - 4-byte characters (emoji) to leave lone surrogates in the buffer The fix backs up from the byte position to find a valid UTF-8 character boundary by checking for continuation bytes (pattern 10xxxxxx), then decodes the properly-aligned bytes to get the correct character count. Also fixes a typo where this._asyncDrainScheduled was used instead of the private field this.#asyncDrainScheduled. Fixes: #61744 PR-URL: #61745 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Paolo Insogna <paolo@cowtech.it>
1 parent dcb1cbb commit a5b1be2

2 files changed

Lines changed: 340 additions & 6 deletions

File tree

lib/internal/streams/fast-utf8-stream.js

Lines changed: 18 additions & 6 deletions
Lines changed: 322 additions & 0 deletions

0 commit comments

Comments
 (0)