Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volume shape is not preserved by round-trip conversion to NumPy array using Volume.arraydata() #92

Closed
Tracked by #124
ylep opened this issue Feb 6, 2023 · 13 comments
Assignees
Labels
wontfix This will not be worked on

Comments

@ylep
Copy link
Member

ylep commented Feb 6, 2023

Describe the bug
The shape on a Volume is changed when it is simply converted to a NumPy array, and converted back to a Volume. This looks like a serious regression in BrainVISA 5.1.0.

To Reproduce
Steps to reproduce the behavior:

  1. Run the following Python script:
from __future__ import print_function
import numpy
from soma import aims

def test_numpy_conversion(vol):
    arr = vol.arraydata()
    test_vol = aims.Volume(arr)
    print(vol.header()['volume_dimension'], '-> .arraydata() ->',
          'shape', arr.shape, 'strides', arr.strides,
          '-> aims.Volume() ->', test_vol.header()['volume_dimension'])

    arr = numpy.asarray(vol)
    test_vol = aims.Volume(arr)
    print(vol.header()['volume_dimension'], '-> numpy.asarray() ->',
          'shape', arr.shape, 'strides', arr.strides,
          '-> aims.Volume() ->', test_vol.header()['volume_dimension'])

    arr = numpy.asfortranarray(vol)
    test_vol = aims.Volume(arr)
    print(vol.header()['volume_dimension'], '-> numpy.asfortranarray() ->',
          'shape', arr.shape, 'strides', arr.strides,
          '-> aims.Volume() ->', test_vol.header()['volume_dimension'])

    arr = numpy.ascontiguousarray(vol)
    test_vol = aims.Volume(arr)
    print(vol.header()['volume_dimension'], '-> numpy.ascontiguousarray() ->',
          'shape', arr.shape, 'strides', arr.strides,
          '-> aims.Volume() ->', test_vol.header()['volume_dimension'])

test_numpy_conversion(aims.read("test.nii.gz"))
  1. The shape of the Volume after round-trip to NumPy is incorrect in all cases, except when using numpy.ascontiguousarray:
[217, 290, 290, 1] -> .arraydata() -> shape (1, 290, 290, 217) strides (36499400, 125860, 434, 2) -> aims.Volume() -> [1, 290, 290, 217]
[217, 290, 290, 1] -> numpy.asarray() -> shape (1, 290, 290, 217) strides (36499400, 125860, 434, 2) -> aims.Volume() -> [1, 290, 290, 217]
[217, 290, 290, 1] -> numpy.asfortranarray() -> shape (1, 290, 290, 217) strides (2, 2, 580, 168200) -> aims.Volume() -> [1, 290, 290, 217]
[217, 290, 290, 1] -> numpy.ascontiguousarray() -> shape (217, 290, 290, 1) strides (168200, 580, 2, 2) -> aims.Volume() -> [217, 290, 290, 1]

Expected behavior

Under BrainVISA 5.0.4 the output was correct with the “natural” .arraydata and numpy.asarray converters:

[217, 290, 290, 1] -> .arraydata() -> shape (1, 290, 290, 217) strides (36499400, 125860, 434, 2) -> aims.Volume() -> [217, 290, 290, 1]
[217, 290, 290, 1] -> numpy.asarray() -> shape (1, 290, 290, 217) strides (36499400, 125860, 434, 2) -> aims.Volume() -> [217, 290, 290, 1]
[217, 290, 290, 1] -> numpy.asfortranarray() -> shape (1, 290, 290, 217) strides (2, 2, 580, 168200) -> aims.Volume() -> [1, 290, 290, 217]
[217, 290, 290, 1] -> numpy.ascontiguousarray() -> shape (217, 290, 290, 1) strides (168200, 580, 2, 2) -> aims.Volume() -> [1, 290, 290, 217]

Environment:

  • Engine: Singularity
  • Version of BrainVISA: 5.1.0
@ylep ylep added the bug Something isn't working label Feb 6, 2023
@denisri
Copy link
Contributor

denisri commented Feb 6, 2023

One problem is that the array returned by both vol.arraydata() and numpy.asarray(vol) is shared and stored in the volume structure to avoid deletion of the volume while the array is still living. thus once you have created one, arraydata() and asarray() return the same instance, and it should not since it is not built the same way. Thus first we should fix that. In your example, if you run the test using asarray() first (after a fresh load of the volume), this one will work. So yes we have to store both shapes of the array and return the right one via both functions.

@denisri
Copy link
Contributor

denisri commented Feb 6, 2023

Well, arraydata() and __array__() are supposed to transpose the array strides accordingly, but apparently they do not (not anymore).

denisri added a commit that referenced this issue Feb 7, 2023
@denisri
Copy link
Contributor

denisri commented Feb 7, 2023

There was a mistake in a comparison to determine strides order. This is fixed and it partly fixes this issue, but we can discuss if it solves it completely or not. Now the test prints:

[217, 290, 290, 1] -> .arraydata() -> shape (1, 290, 290, 217) strides (36499400, 125860, 434, 2) -> aims.Volume() -> [1, 290, 290, 217]
[217, 290, 290, 1] -> numpy.asarray() -> shape (217, 290, 290, 1) strides (2, 434, 125860, 36499400) -> aims.Volume() -> [217, 290, 290, 1]
[217, 290, 290, 1] -> numpy.asfortranarray() -> shape (217, 290, 290, 1) strides (2, 434, 125860, 36499400) -> aims.Volume() -> [217, 290, 290, 1]
[217, 290, 290, 1] -> numpy.ascontiguousarray() -> shape (217, 290, 290, 1) strides (168200, 580, 2, 2) -> aims.Volume() -> [217, 290, 290, 1]

thus only the 1st (using arraydata()) transposes the volume when rebuilding. We can discuss whether this behavior is correct or not (in other words: should an array be silently transposed under some strides conditions when building a volume from it ?). My opinion is that it is OK: when building a volume from an array of a given shape, we get a volume with the same shape whatever its strides.
Using 5.0.4 the behavior was not the same, but was wrong also, because 1. it did not respect the rule above (a volume built from an array should respect the array shape), 2. asarray() did return the wong shape when arraydata() had been called before.

So my opinion is that it is solved now. Any comments ?

denisri added a commit that referenced this issue Feb 7, 2023
@denisri
Copy link
Contributor

denisri commented Feb 7, 2023

Well there is still something not correct:

from soma import aims
import numpy as np

vol = aims.read("test.nii.gz"))
arr = np.ascontiguousarray(vol.np)
print('contiguous array shape:', arr.shape, ', strides:', arr.strides)
vol2 = aims.Volume(arr)
print('new vol:', vol2.getSize(), ', array shape:', vol2.shape, ', strides:', vol2.np.strides)

prints:

contiguous array shape: (217, 290, 290, 1) , strides: (168200, 580, 2, 2)
new vol: [217, 290, 290, 1] , array shape: (1, 290, 290, 217) , strides: (2, 2, 580, 168200)

so from the Volume point of view it seems OK, but the array returned by vol2.np is transposed.

@ylep
Copy link
Member Author

ylep commented Feb 7, 2023

from __future__ import print_function
import numpy
from soma import aims

def test_numpy_conversion(filename):
    vol = aims.read(filename)
    arr = vol.arraydata()
    test_vol = aims.Volume(arr)
    print(vol.header()['volume_dimension'], '-> .arraydata() ->',
          'shape', arr.shape, 'strides', arr.strides,
          '-> aims.Volume() ->', test_vol.header()['volume_dimension'])

    vol = aims.read(filename)
    arr = numpy.asarray(vol)
    test_vol = aims.Volume(arr)
    print(vol.header()['volume_dimension'], '-> numpy.asarray() ->',
          'shape', arr.shape, 'strides', arr.strides,
          '-> aims.Volume() ->', test_vol.header()['volume_dimension'])

    vol = aims.read(filename)
    arr = numpy.asfortranarray(vol)
    test_vol = aims.Volume(arr)
    print(vol.header()['volume_dimension'], '-> numpy.asfortranarray() ->',
          'shape', arr.shape, 'strides', arr.strides,
          '-> aims.Volume() ->', test_vol.header()['volume_dimension'])

    vol = aims.read(filename)
    arr = numpy.ascontiguousarray(vol)
    test_vol = aims.Volume(arr)
    print(vol.header()['volume_dimension'], '-> numpy.ascontiguousarray() ->',
          'shape', arr.shape, 'strides', arr.strides,
          '-> aims.Volume() ->', test_vol.header()['volume_dimension'])

vol = aims.Volume(1, 2, 3, 4)
aims.write(vol, "test.nii.gz")
test_numpy_conversion("test.nii.gz")
$ /i2bm/brainvisa/brainvisa-cea-5.0.4/bin/bv python min.py
[1, 2, 3, 4] -> .arraydata() -> shape (4, 3, 2, 1) strides (24, 8, 4, 4) -> aims.Volume() -> [1, 2, 3, 4]
[1, 2, 3, 4] -> numpy.asarray() -> shape (1, 2, 3, 4) strides (4, 4, 8, 24) -> aims.Volume() -> [1, 2, 3, 4]
[1, 2, 3, 4] -> numpy.asfortranarray() -> shape (1, 2, 3, 4) strides (4, 4, 8, 24) -> aims.Volume() -> [1, 2, 3, 4]
[1, 2, 3, 4] -> numpy.ascontiguousarray() -> shape (1, 2, 3, 4) strides (96, 48, 16, 4) -> aims.Volume() -> [4, 3, 2, 1]
$ /i2bm/brainvisa/brainvisa-cea-5.1.0/bin/bv python min.py
[1, 2, 3, 4] -> .arraydata() -> shape (4, 3, 2, 1) strides (24, 8, 4, 4) -> aims.Volume() -> [4, 3, 2, 1]
[1, 2, 3, 4] -> numpy.asarray() -> shape (1, 2, 3, 4) strides (4, 4, 8, 24) -> aims.Volume() -> [1, 2, 3, 4]
[1, 2, 3, 4] -> numpy.asfortranarray() -> shape (1, 2, 3, 4) strides (4, 4, 8, 24) -> aims.Volume() -> [1, 2, 3, 4]
[1, 2, 3, 4] -> numpy.ascontiguousarray() -> shape (1, 2, 3, 4) strides (96, 48, 16, 4) -> aims.Volume() -> [1, 2, 3, 4]

Actually, when we load the volumes from scratch to set aside the cache issue, it seems that the only issue is with convesion of array to Volume following .arraydata()

@denisri
Copy link
Contributor

denisri commented Feb 7, 2023

Well there is still something not correct:

from soma import aims
import numpy as np

vol = aims.read("test.nii.gz"))
arr = np.ascontiguousarray(vol.np)
print('contiguous array shape:', arr.shape, ', strides:', arr.strides)
vol2 = aims.Volume(arr)
print('new vol:', vol2.getSize(), ', array shape:', vol2.shape, ', strides:', vol2.np.strides)

prints:

contiguous array shape: (217, 290, 290, 1) , strides: (168200, 580, 2, 2)
new vol: [217, 290, 290, 1] , array shape: (1, 290, 290, 217) , strides: (2, 2, 580, 168200)

so from the Volume point of view it seems OK, but the array returned by vol2.np is transposed.

This is fixed by a294104.

@ylep
Copy link
Member Author

ylep commented Feb 7, 2023

I am compiling your fix to test its behaviour with my Python script

@denisri
Copy link
Contributor

denisri commented Feb 7, 2023

I am in the process if integrating your test function (slightly modified) in test scripts.

denisri added a commit that referenced this issue Feb 7, 2023
denisri added a commit that referenced this issue Feb 7, 2023
denisri added a commit that referenced this issue Feb 7, 2023
denisri added a commit that referenced this issue Feb 7, 2023
@ylep
Copy link
Member Author

ylep commented Feb 8, 2023

Okay so after your changes, my test script outputs:

$ ./cea-5.1/bin/bv python min.py
[1, 2, 3, 4] -> .arraydata() -> shape (4, 3, 2, 1) strides (24, 8, 4, 4) -> aims.Volume() -> [4, 3, 2, 1]
[1, 2, 3, 4] -> numpy.asarray() -> shape (1, 2, 3, 4) strides (4, 4, 8, 24) -> aims.Volume() -> [1, 2, 3, 4]
[1, 2, 3, 4] -> numpy.asfortranarray() -> shape (1, 2, 3, 4) strides (4, 4, 8, 24) -> aims.Volume() -> [1, 2, 3, 4]
[1, 2, 3, 4] -> numpy.ascontiguousarray() -> shape (1, 2, 3, 4) strides (96, 48, 16, 4) -> aims.Volume() -> [1, 2, 3, 4]

… which is exactly the same as BrainVISA 5.1.0, or am I missing something?

My understanding is that:

  • In new code, we should never use arraydata and prefer the asarray or .np method.
  • In BrainVISA 5.0.4, aims.Volume used to interpret NumPy arrays differently depending on their internal data ordering (strides), whereas now it is always consistent with [x, y, z, t] indexing, which is an intended change and a good thing.
  • However, this breaks existing scripts which used arraydata and back-conversion to a Volume. I wonder if we should deprecate or remove that method so people don't fall into this trap? (arraydata is littered in a lot of urers’ scripts... which now behave inconsistently and may result in subtle data corruption). A DeprecationWarning is ignored by default, so maybe a RuntimeWarning would be more appropriate...

@denisri
Copy link
Contributor

denisri commented Feb 8, 2023

… which is exactly the same as BrainVISA 5.1.0, or am I missing something?

If you are using your second, modified test code, which reallocates the volume between tests. And in my opinion, this is what is expected. My fixes are about the array cache in the volume (which primary goal is to avoid deleting the volume which owns the data block while the array is still living), thus fix your first test.
arraydata() is indeed the older method, and should be avoided now. Its only benefit is to provide the raw array data with memory ordering. Most codes should use np.asarray() or the shortcut volume.np.
The difference between aims 5.0 and 5.1 is that 5.0 did not manage all strides correctly, and did require that the X dimension was contiguous in memory. This restriction is gone, and we can now deal with arbitrary arrays.

@ylep ylep changed the title Volume shape is not preserved by round-trip conversion to NumPy array Volume shape is not preserved by round-trip conversion to NumPy array using Volume.arraydata() Feb 9, 2023
@ylep ylep added wontfix This will not be worked on and removed bug Something isn't working labels Feb 9, 2023
@ylep
Copy link
Member Author

ylep commented Feb 10, 2023

I just had another user run into the exact same issue because they have arraydata() in their script, and convert it back to a Volume. If I undersstand correctly, that idiom used to work by sheer chance, because of the bug in Volume not taking strides into account correctly.

I would propose to drop support for arraydata() entirely, making it raise an exception telling people to fix their code and use numpy.asarray or volume.np instead. If some specific use-cases need its functionality, we can make it available under another name (e.g. raw_internal_array or something similar).

We also need to fix the code that we distribute: I counted 501 uses of arraydata in Python code in the cea distro on the 5.1 branch:

$ ack -ch --python --ignore-dir=master '.arraydata'
501

@denisri
Copy link
Contributor

denisri commented Feb 10, 2023

OK I agree to make arraydata() obsolete.
The method also exists in vectors and textures (1D arrays), and is harmless there and identical to asarray(), so there are certainly less than 501 "dangerous" uses of arraydata() on volumes, but I also agree that we should deprecate it there too.

@ylep
Copy link
Member Author

ylep commented Feb 13, 2023

This issue is essentially solved (as a WONTFIX) so I propose to track the deprecation of Volume.arraydata() in the new issue #93

@ylep ylep closed this as completed Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants