ENH: Removed requirement for C-contiguity when changing to dtype of different size by madphysicist · Pull Request #20722 · numpy/numpy · GitHub
Skip to content

ENH: Removed requirement for C-contiguity when changing to dtype of different size#20722

Merged
mattip merged 1 commit into
numpy:mainfrom
madphysicist:dtype-checking-1
Jan 6, 2022
Merged

ENH: Removed requirement for C-contiguity when changing to dtype of different size#20722
mattip merged 1 commit into
numpy:mainfrom
madphysicist:dtype-checking-1

Conversation

@madphysicist

@madphysicist madphysicist commented Jan 4, 2022

Copy link
Copy Markdown
Contributor

This is a first attempt to fix some of #20705. In this PR, I attempt to make the following two changes to np.ndarray.view:

  1. If dtype is smaller than the current dtype, and it divides the current one evenly, and there is at least one unit dimension, expand the last unit dimension instead of crashing.
  2. If dtype is larger than the current dtype, and there exists an axis along which the current array is contiguous (stride == current dtype.itemsize), use that axis instead of crashing.

Both cases are valid because the check for dtype multiplicative compatibility is still done, and the dimension/stride checks guarantee that the resulting shape changes do not exceed known owned memory.

One possible modification is that we may want to restrict this only to the last dimension. Will let reviewers with more understanding decide.

This PR will allow a non-hacky rewrite of most of the functionality of #20694. If accepted, the second half of the functionality will be allowing as_strided to modify the offset and dtype of an array, in addition to its strides and shape.


Edit by seberg, posting the summary of the changes for Fortran order arrays here, in case someone comes here from the release notes:

Behavior deprecated in NumPy 1.11.0 allowed the following counterintuitive result::

 >>> x = np.array(["aA", "bB", "cC", "dD", "eE", "fF"]).reshape(1, 2, 3).transpose()
 >>> x.view('U1')  # deprecated behavior, shape (6, 2, 1)
 DeprecationWarning: ...
 array([[['a'],
         ['d']],
 
        [['A'],
         ['D']],
 
        [['b'],
         ['e']],
 
        [['B'],
         ['E']],
 
        [['c'],
         ['f']],
 
        [['C'],
         ['F']]], dtype='<U1')

Now that the deprecation has expired, dtype reassignment only happens along the
last axis, so the above will result in::

 >>> x.view('U1')  # new behavior, shape (3, 2, 2)
 array([[['a', 'A'],
         ['d', 'D']],
 
        [['b', 'B'],
         ['e', 'E']],
 
        [['c', 'C'],
         ['f', 'F']]], dtype='<U1')

When the last axis is not contiguous, an error is now raised in place of the DeprecationWarning::

 >>> x = np.array(["aA", "bB", "cC", "dD", "eE", "fF"]).reshape(2, 3).transpose()
 >>> x.view('U1')
 ValueError: To change to a dtype of a different size, the last axis must be contiguous

The new behavior is equivalent to the more intuitive::

 >>> x.copy().view('U1')

To replicate the old behavior on F-but-not-C-contiguous arrays, use::

 >>> x.T.view('U1').T

Comment thread numpy/core/src/multiarray/getset.c
Comment thread numpy/lib/stride_tricks.py
Comment thread numpy/core/src/multiarray/getset.c Outdated
@madphysicist madphysicist marked this pull request as ready for review January 4, 2022 08:00
Comment thread numpy/core/_add_newdocs.py
Comment thread numpy/core/tests/test_multiarray.py Outdated

@mhvk mhvk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madphysicist - yes to finally being able to split on the last axis if that axis is contiguous rather than require the whole array to be contiguous!

But I am torn about the idea of allowing one to split on any contiguous axis with size 1; if we allow that, then I do not see why the dimension should be unity - all that should be required is that it is the contiguous axis. But it seems that led to real confusion about that for arrays that are contiguous in more than one axis (and the choice between F and C contiguous).

Looking at your test cases, I found the ones that did not work on the last axis confusing - it really needed thinking why they worked as they did.

More generally, I think I'd prefer for arr.view(...) to always do exactly the same as arr.copy().view(...), which is not true when splitting on any axis but the last.

Given this, my preference would thus be to stick to just allowing the final axis. Does that solve the character slicing issues?

p.s. I should add that mostly I want this to happen! In the end, any time one does a .view(dtype) (which I do a lot), one better really think about it.

Comment thread numpy/core/tests/test_multiarray.py Outdated
Comment thread numpy/core/src/multiarray/getset.c Outdated
Comment thread numpy/core/tests/test_multiarray.py Outdated
Comment thread numpy/core/tests/test_multiarray.py Outdated
Comment thread numpy/core/src/multiarray/getset.c Outdated
@mhvk

mhvk commented Jan 4, 2022

Copy link
Copy Markdown
Contributor

@eric-wieser

eric-wieser commented Jan 4, 2022

Copy link
Copy Markdown
Member

I agree with @mhvk, requiring contiguity of the last axis is the least surprising choice of semantics.

@madphysicist

Copy link
Copy Markdown
Contributor Author

@eric-wieser @mhvk After reading your comments and looking at the mess some of the tests were, I have to agree that the sane approach is to only allow the last axis to change. Transposes are cheap nowadays, so this should not be an issue. It certainly does not hold back anything in #20694: strings are pretty much guaranteed contiguous along the last axis.

Please see the updated code. It's much simpler, and the counter-intuitive tests are now gone, replaced by a single test showing that f-contiguous layouts still raise an error (with an updated message though).

@madphysicist

Copy link
Copy Markdown
Contributor Author

I don't think I can fix the linter error: it's a doctest with an exception that's just too long to fit in 72 chars.

@eric-wieser

Copy link
Copy Markdown
Member

This looks great now. Can you add a release note? You should mention that as a result of this change, the formerly-deprecated action of .view on fortran-order arrays now does something different.

Comment thread numpy/core/_add_newdocs.py
@madphysicist

Copy link
Copy Markdown
Contributor Author

@eric-wieser Two questions: 1. what do I do with the broken smoketest? 2. what version is this going into? 1.21.1?

@eric-wieser

eric-wieser commented Jan 5, 2022

Copy link
Copy Markdown
Member
  1. what do I do with the broken smoketest?

Remove it, it's testing the deprecation that you intentionally (and justifiably) removed:

=================================== FAILURES ===================================
__________ TestNonCContiguousViewDeprecation.test_fortran_contiguous ___________

self = <numpy.core.tests.test_deprecations.TestNonCContiguousViewDeprecation object at 0x7fa4cf0247c0>

    def test_fortran_contiguous(self):
>       self.assert_deprecated(np.ones((2,2)).T.view, args=(complex,))
E       ValueError: To change to a dtype of a different size, the last axis must be contiguous

self       = <numpy.core.tests.test_deprecations.TestNonCContiguousViewDeprecation object at 0x7fa4cf0247c0>

../../builds/venv/lib/python3.8/site-packages/numpy-1.23.0.dev0+355.g163c17aed-py3.8-linux-x86_64.egg/numpy/core/tests/test_deprecations.py:270: ValueError

(from the logs)

Comment thread doc/release/upcoming_changes/20722.new_feature.rst Outdated
Comment thread doc/release/upcoming_changes/20722.new_feature.rst Outdated

@mhvk mhvk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great now! Thanks for tackling this long-standing nuisance - I think this indeed ended up with a nice & clean solution!

@madphysicist

Copy link
Copy Markdown
Contributor Author

I don't think I can reasonably fix the linter error, and the azure pipeline error appears to be unrelated to this PR. I think this PR is ready to go. @mhvk, @eric-wieser, @mattip thank you very much for your help and support in this. Glad we came up with a solution we can all agree on.

Comment thread numpy/core/_add_newdocs.py Outdated
@mattip

mattip commented Jan 5, 2022

Copy link
Copy Markdown
Member

Just ignore the single lint error, it is too hard to fix.

@madphysicist

Copy link
Copy Markdown
Contributor Author

The Azure error is strange. It's a different error from last time, but the only difference is the updated version string. I'm guessing it's not related to this PR. Why do seemingly random errors show up periodically?

Comment thread doc/release/upcoming_changes/20722.expired.rst Outdated
Comment thread doc/release/upcoming_changes/20722.expired.rst Outdated
Comment thread numpy/core/_add_newdocs.py Outdated
Comment thread doc/release/upcoming_changes/20722.expired.rst Outdated
Comment thread doc/release/upcoming_changes/20722.expired.rst Outdated
Comment thread doc/release/upcoming_changes/20722.expired.rst Outdated
Comment thread doc/release/upcoming_changes/20722.expired.rst Outdated
Comment thread numpy/core/_add_newdocs.py
Comment thread doc/release/upcoming_changes/20722.expired.rst Outdated
Comment thread numpy/core/src/multiarray/getset.c Outdated
Comment thread doc/release/upcoming_changes/20722.expired.rst Outdated
Comment thread doc/release/upcoming_changes/20722.new_feature.rst Outdated

@eric-wieser eric-wieser left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll stop bike-shedding the docs now, and let someone more involved in recent handling of deprecations take over. This looks great to me, and I'm surprised no one took this on earlier!

@mattip

mattip commented Jan 5, 2022

Copy link
Copy Markdown
Member

LGTM. We can always tweak the documentation going forward. For other reviewers: ignore the nit-picky linting error.

@madphysicist

Copy link
Copy Markdown
Contributor Author

@eric-wieser. I suspect no one expected it to be this simple. I expected to have to modify both view and as_strided as a minimum, and not just shorten the code on a single set descriptor.

I still want to add offset and dtype arguments to as_strided. It's still affected by this contiguity check because it uses asarray under the hood, but I really don't think it should be. Do you have any thoughts on that?

@madphysicist madphysicist changed the title ENH: Added ability to change dtype of view when last axis has shape 1 ENH: Removed requirement for C-contiguity when changing to dtype of different size Jan 5, 2022
@eric-wieser

Copy link
Copy Markdown
Member

I still want to add offset and dtype arguments to as_strided. It's still affected by this contiguity check because it uses asarray under the hood, but I really don't think it should be. Do you have any thoughts on that?

Let's discuss this in the original PR that motivated this one, #20694.

@mhvk mhvk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all great to me. Probably good to squash the commits, though.

@madphysicist

Copy link
Copy Markdown
Contributor Author

@mhvk Done. I probably should have split off the deprecation expiry into a separate commit, but it's too late now.

Comment thread doc/release/upcoming_changes/20722.expired.rst Outdated
Expires deprecated F-contiguous behavior.
Simplifies C code of dtype set descriptor.
Adds tests that verify which error condition is triggered.
Introduces extra long exception message that upsets linter.
@madphysicist

Copy link
Copy Markdown
Contributor Author

Looks like everything besides the linter is passing.

@mattip mattip merged commit 1684a93 into numpy:main Jan 6, 2022
@mattip

mattip commented Jan 6, 2022

Copy link
Copy Markdown
Member

Thanks @madphysicist. Github has a "squash and merge" button, so in the future you don't need to necessarily squash commits down to one, we can do it when we merge.

@madphysicist

Copy link
Copy Markdown
Contributor Author

@mattip. I'll keep that in mind for the future. Which release is this going into by the way? The obspy issue that references this seems to think that it's 1.22.1, while you asked me to set the .. versionchanged:: to 1.23.0

@mattip

mattip commented Jan 6, 2022

Copy link
Copy Markdown
Member

It is up to the release manager. If @charris backports this to 1.22 then we should change the version changed appropriately.

@seberg

seberg commented Jan 6, 2022

Copy link
Copy Markdown
Member

Since this is a (nice!) enhancment and finalized a deprecation, it would normally not be backported. Which doesn't mean that there could not be an exception if it is particularly important.

@charris

charris commented Jan 6, 2022

Copy link
Copy Markdown
Member

liamtoney added a commit to liamtoney/sonify that referenced this pull request Jan 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants