iframe-proxy

mjbommar · 2026-04-16T15:35:37Z

PR summary

As described in #31254, non-ASCII strings are not handled
safely by arrayflags_setitem and PyArray_ToFile.

What to do with Unicode strings elsewhere upstream / downstream
is a complicated question that has design consequences, so
@WarrenWeckesser suggested reasonable error handling in the bug
report comments to provide the user with useful, non-segfaulting
execptions.

This PR simply implements those recommendations and adds coverage
to ensure KeyError / ValueError are properly raised when
ecnountered.

First-time contributor introduction

Hi there! Numeric/numarray user who has been on the long journey
after switching from Matlab and Fortran a long time ago...thank you
for all of your work - I know how much went into 2.0!

AI Disclosure

The original C issue was detected using Claude Agent SDK and a variety of
tools like semgrep / ast-grep / libclang.

Claude Code (Opus 4.6) and Codex CLI were used to prepare the PR branch,
although the credit for the proposed fixes themselves really go to
@WarrenWeckesser.

PS: sorry again for earlier auto-submission.

jorenham · 2026-04-16T16:03:44Z

mjbommar · 2026-04-16T16:10:21Z

Very sorry @jorenham , codex sub agent ignored instructions not to execute gh CLI commands...

I think the patch contents itself were already specified by @WarrenWeckesser in the original report that I made yesterday. I reviewed the code, ran the original MRE to confirm the patches fixed the issue, and checked that build/test are green after.

If you re-open it, I will manually edit this to clean up the template. Otherwise
I'm just going to create clutter opening another PR to fix as @WarrenWeckesser requested

tylerjereddy · 2026-04-16T17:45:37Z

+            if (byteobj == NULL) {
+                Py_DECREF(strobj);
+                Py_DECREF(it);
+                if (n4 != 0) {


Note that the full NumPy testsuite passes for me locally on this branch if I simplify this conditional to always execute per diff below. If this is genuinely needed, it might be helpful to enforce the requirement with a test. If there's a good reason it is needed, but can't be tested, perhaps an explanatory comment might at least make sense.

--- a/numpy/_core/src/multiarray/convert.c +++ b/numpy/_core/src/multiarray/convert.c @@ -302,7 +302,7 @@ PyArray_ToFile(PyArrayObject *self, FILE *fp, char *sep, char *format) if (byteobj == NULL) { Py_DECREF(strobj); Py_DECREF(it); - if (n4 != 0) { + if (1) { PyErr_SetString(PyExc_ValueError, "The `format` parameter must contain only ASCII " "characters.");

I'll add a test to catch this branch (e.g., str())

@tylerjereddy - OK, should see coverage on that branch with the extra test I pushed

I would be tempted to just stick with the unicode error either way. Sure, it's not necessary immediately clear it comes from the format itself, but I suspect it's clear enough?

@seberg I'd defer to you and @WarrenWeckesser, who originally proposed it.

It turns out it is probably better to stick with the unicode error as @seberg suggests. I hadn't considered Unicode string arrays in my initial comments in #31254. As it is now in the PR, if the array contains non-ASCII Unicode, and format="%s", the error message in the ValueError is not correct:

In [25]: np.__version__ Out[25]: '2.5.0.dev0+git20260416.dedf7c2' In [26]: s = np.array(['λ123'], dtype='<U4') In [27]: s.tofile('junk.dat', sep="; ", format="%s") --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[27], line 1 ----> 1 s.tofile('junk.dat', sep="; ", format="%s") ValueError: The `format` parameter must contain only ASCII characters.

@WarrenWeckesser @seberg OK, see note below, hopefully this is what you were thinking for the format case

Can we please just use the error as-is? I don't think it is worth going through the trouble to validate the format string explicitly just to give it an explicit ValueError.

mjbommar · 2026-04-20T12:30:53Z

OK, hopefully this hits the mark now. Here's a summary of everything that's in the PR now:

Fixed arrayflags_setitem so non-ASCII flag keys no longer crash and instead follow the existing KeyError path.
Fixed ndarray.tofile() so non-ASCII format strings no longer crash and now raise ValueError.
Preserved UnicodeEncodeError for non-ASCII Unicode array element data in tofile(), instead of misreporting it as a format error.
Covered the reviewed format="%s" Unicode-array case so it now raises UnicodeEncodeError rather than a misleading ValueError.
Added regression tests for non-ASCII flag keys, non-ASCII format, non-ASCII element data without format, and non-ASCII element data with ASCII format="%s".
Added a release note describing the crash fixes and resulting exception behavior.

If you want me to clean up the branch so it's clean without the commit history, just let me know and I'll push a single clean commit. Wasn't sure if that will erase the code review or break the GH UI now. I'm afraid to do anything these days with all the changes they made...

ngoldbaum

I triggered CI but unfortunately github actions is having issues and there are lots of unrelated test failures. Someone is probably going to need to re-trigger.

I left a comment below about C API style and another nit about how you classified the release note. Please take or leave them as you see fit.

ngoldbaum · 2026-04-23T16:41:32Z

total nit but this should probably be labeled as an "improvement" rather than a "change"

ngoldbaum · 2026-04-23T16:48:03Z

+        Py_XDECREF(formatobj);
        Py_DECREF(it);
    }
    return 0;


Another way to handle the repeated cleanup code is to add a cleanup goto label below here that does Py_XDECREF on all the objects that need cleanup in error paths and then use goto cleanup in error paths above.

You can even define the return value like int ret = -1 in the function header and then set ret = 0 here and be sure then return ret in the cleanup code to unify the error paths and the happy path.

You also need to make sure that you initialize all the variables to NULL before the first use. The cost of this approach is some extra NULL checks which IMO is worth it for the improved maintainability and readability.

IMO that's better style for CPython C API code (despite the use of goto) because it avoids duplicate or almost-duplicate cleanup code which can sometimes be tricky to reason about.

mjbommar added 2 commits April 16, 2026 11:35

BUG: Handle non-ASCII flags and tofile formats

0a62a6f

DOC: Add release note for numpygh-31254 fix

942477f

mjbommar mentioned this pull request Apr 16, 2026

BUG: SIGSEGV on non-ASCII strings for arrayflags_setitem and PyArray_ToFile #31254

Open

jorenham closed this Apr 16, 2026

jorenham reopened this Apr 16, 2026

tylerjereddy added the 00 - Bug label Apr 16, 2026

tylerjereddy reviewed Apr 16, 2026

View reviewed changes

mjbommar added 2 commits April 16, 2026 14:33

TST: cover unformatted non-ASCII tofile errors

dedf7c2

BUG: preserve tofile unicode encode errors

852cfa3

BUG: keep natural tofile unicode errors

a6cfea5

jorenham mentioned this pull request Apr 21, 2026

BUG: prevent crash when setting array flags with non-ASCII strings #31297

Closed

ngoldbaum reviewed Apr 23, 2026

View reviewed changes

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

mjbommar commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR summary

First-time contributor introduction

AI Disclosure

Uh oh!

jorenham commented Apr 16, 2026

Uh oh!

mjbommar commented Apr 16, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mjbommar commented Apr 20, 2026

Uh oh!

ngoldbaum left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngoldbaum Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

mjbommar commented Apr 16, 2026 •

edited

Loading

ngoldbaum Apr 23, 2026 •

edited

Loading