Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: create roi selection #90

Open
wants to merge 55 commits into
base: main
Choose a base branch
from

Conversation

ajonkman
Copy link

@ajonkman ajonkman commented Sep 1, 2023

Beginnings of ROI selection features

ajonkman and others added 30 commits September 1, 2023 16:44
Tests now include inputs with np.nan values and values that are not 1.
This allows e.g. vertical regions to be split by row, by horizontal regions not be split by column.
@psomhorst psomhorst force-pushed the 087_create_roi_selection_ajonkman branch from 1045834 to 170ceea Compare October 11, 2023 07:06
@psomhorst psomhorst requested a review from DaniBodor October 11, 2023 11:52
@psomhorst psomhorst self-assigned this Oct 11, 2023
@psomhorst psomhorst added the enhancement New feature or request label Oct 11, 2023
@psomhorst psomhorst marked this pull request as ready for review October 11, 2023 11:53
@psomhorst
Copy link
Contributor

psomhorst commented Oct 11, 2023

@DaniBodor @ajonkman This feature is now ready for review. It has become vastly more complicated than I first intended due to these factors:

  • there should be a way to split a row or column between two regions, if otherwise regions would consist of a different number of rows/columns;
  • it was pointed out that splitting rows/columns should be set individually, so you can e.g. split the left/right without splitting a column, but then split ventral/dorsal with splitting rows;
  • in some cases, you want to ignore rows/columns that entirely consists of NaN (not a number) values, while in other you don't want to do that;
  • ignoring NaN values should also be set for rows and columns individually.

These features have now been built in.

I'm not entirely sure this API will work best, yet. I think it might be better to save the matrices to the object itself when calling find_grid(), and have a function to do the actual splitting of data into regions as well. But that concern can be addressed later. Maybe this API will just work.

@psomhorst
Copy link
Contributor

psomhorst commented Oct 11, 2023

TODO

  • Write tests where ignore_nan_columns and/or ignore_nan_rows is/are False.

np.outer(a, b) is the same as a[:, np.newaxis] @ b[np.newaxis, :], when
a and b are 1D arrays.
np.newaxis is an alias of None, but more clear when reading the code.
This class has been moved to its own branch 115_functionallungspace_psomhorst
@psomhorst psomhorst force-pushed the 087_create_roi_selection_ajonkman branch from 8e7a244 to af172d7 Compare October 27, 2023 11:54
Copy link
Member

@DaniBodor DaniBodor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to look carefully at the tests, but thought I'd already open these comments/suggestions up anyway, as #119 has higher priority right now.

Comment on lines +16 to +22
GridSelection allows for the creation a list of 2D arrays that can be used to divide a two- or
higher-dimensional array into several regions structured in a grid. An instance of
GridSelection contains information about how to subdivide an input matrix. Calling
`find_grid(data)`, where data is a 2D array, results in a list of arrays with the same
dimension as `data`, each representing a single region. Each resulting 2D array contains the
value 0 for pixels that do not belong to the region, and the value 1 or any number between 0
and 1 for pixels that (partly) belong to the region.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of this is a rather generic explanation of what an ROI is/how we define it and handle it. Should that part of this docstring perhaps move to the docstring of the parent class?

also, see typo below:

Suggested change
GridSelection allows for the creation a list of 2D arrays that can be used to divide a two- or
higher-dimensional array into several regions structured in a grid. An instance of
GridSelection contains information about how to subdivide an input matrix. Calling
`find_grid(data)`, where data is a 2D array, results in a list of arrays with the same
dimension as `data`, each representing a single region. Each resulting 2D array contains the
value 0 for pixels that do not belong to the region, and the value 1 or any number between 0
and 1 for pixels that (partly) belong to the region.
GridSelection allows for the creation of a list of 2D arrays that can be used to divide a two- or
higher-dimensional array into several regions structured in a grid. An instance of
GridSelection contains information about how to subdivide an input matrix. Calling
`find_grid(data)`, where data is a 2D array, results in a list of arrays with the same
dimension as `data`, each representing a single region. Each resulting 2D array contains the
value 0 for pixels that do not belong to the region, and the value 1 or any number between 0
and 1 for pixels that (partly) belong to the region.

Comment on lines +29 to +41
If the number of rows or columns can not split evenly, a row or column can be split among two
regions. This behaviour is controlled by `split_rows` and `split_columns`.

If `split_rows` is `False` (default), rows will not be split between two groups. A warning will
be shown stating regions don't contain equal numbers of rows. The regions towards the top will
be larger. E.g., when a (5, 2) array is split in two vertical regions, the first region will
contain the first three rows, and the second region the last two rows.

If `split_rows` is `True`, e.g. a (5, 2) array that is split in two vertical regions, the first
region will contain the first two rows and half of each pixel of the third row. The second
region contains half of each pixel in the third row, and the last two rows.

`split_columns` has the same effect on columns as `split_rows` has on rows.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be clearer to add this to "Args" or examples section of the docstring instead? Not sure if that is more or less clear, tbh.

Comment on lines +43 to +44
Regions are ordered according to C indexing order. The `matrix_layout()` method provides a map
showing how the regions are ordered.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear what "C indexing order" is. Maybe refer to example below?
I also rephrased slightly to make it clearer that the method refers to the same thing.

Suggested change
Regions are ordered according to C indexing order. The `matrix_layout()` method provides a map
showing how the regions are ordered.
Regions are ordered according to C indexing order, and a map of this ordering can be produced
using the `matrix_layout()` method (see example below).

Comment on lines +66 to +90
>>> gs = GridSelection(3, 1, split_pixels=False)
>>> matrices = gs.find_grid(pixel_map)
>>> matrices[0] * pixel_map
array([[1, 2, 3],
[4, 5, 6],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])
>>> gs.matrix_layout()
array([[0],
[1],
[2]])
>>> gs2 = GridSelection(2, 2, split_pixels=True)
>>> matrices2 = gs.find_grid(pixel_map)
>>> gs2.matrix_layout()
array([[0, 1],
[2, 3]])
>>> matrices2[2]
array([[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[1. , 0.5, 0. ],
[1. , 0.5, 0. ],
[1. , 0.5, 0. ]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few suggestions (all collapsed into one below), see also 4f78ebb:

  • Correct typo of split_pixels.
  • rename matrices -> rois
  • move matrix layout above the resulting map
  • add one more example to each gs, for added clarity.
Suggested change
>>> gs = GridSelection(3, 1, split_pixels=False)
>>> matrices = gs.find_grid(pixel_map)
>>> matrices[0] * pixel_map
array([[1, 2, 3],
[4, 5, 6],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])
>>> gs.matrix_layout()
array([[0],
[1],
[2]])
>>> gs2 = GridSelection(2, 2, split_pixels=True)
>>> matrices2 = gs.find_grid(pixel_map)
>>> gs2.matrix_layout()
array([[0, 1],
[2, 3]])
>>> matrices2[2]
array([[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[1. , 0.5, 0. ],
[1. , 0.5, 0. ],
[1. , 0.5, 0. ]])
>>> gs = GridSelection(3, 1, split_rows=False)
>>> rois = gs.find_grid(pixel_map)
>>> gs.matrix_layout()
array([[0],
[1],
[2]])
>>> rois[0] * pixel_map
array([[1, 2, 3],
[4, 5, 6],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])
>>> rois[1] * pixel_map
array([[0, 0, 0],
[0, 0, 0],
[7, 8, 9],
[10, 11, 12],
[0, 0, 0],
[0, 0, 0]])
>>> gs2 = GridSelection(2, 2, split_columns=True)
>>> rois2 = gs.find_grid(pixel_map)
>>> gs2.matrix_layout()
array([[0, 1],
[2, 3]])
>>> rois2[2]
array([[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[1. , 0.5, 0. ],
[1. , 0.5, 0. ],
[1. , 0.5, 0. ]])
>>> rois2[3]
array([[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0.5, 1. ],
[0. , 0.5, 1. ],
[0. , 0.5, 1. ]])

Comment on lines +108 to +110
def __post_init__(self):
self._check_attribute_type("v_split", int)
self._check_attribute_type("h_split", int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we so strict on using the correct types beyond giving type hints?
I don't see the point in raising an error if someone passes 2.0 or something. This could happen if some external package uses floats by default for whatever reason.

If we do want to remain strict, see 62bc8d4 for a suggestion of how to simplify coding this.

split_columns: Allows columns to be split over two regions.

Examples:
>>> pixel_map = array([[ 1, 2, 3],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
>>> pixel_map = array([[ 1, 2, 3],
>>> pixel_map = np.array([[ 1, 2, 3],

Comment on lines +316 to +325
class InvalidDivision(Exception):
"""Raised when the data can't be divided into regions."""


class InvalidHorizontalDivision(InvalidDivision):
"""Raised when the data can't be divided into horizontal regions."""


class InvalidVerticalDivision(InvalidDivision):
"""Raised when the data can't be divided into vertical regions."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the added value of these over a ValueError?

and what's the added value of having separate horizontal and vertical versions, over specifying this in the error message (same question for warnings below)?

Comment on lines +95 to +98
split_rows: bool = False
split_columns: bool = False
ignore_nan_rows: bool = True
ignore_nan_columns: bool = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to have a single argument for each of split and ignore, as I think that more often than not a user would want the same behavior for both.
We can set it so that it accepts either a bool or a 2-part tuple of bools (for horizontal and vertical).

Suggested change
split_rows: bool = False
split_columns: bool = False
ignore_nan_rows: bool = True
ignore_nan_columns: bool = True
split_pixels: bool | tuple[bool, bool] = False
ignore_nan_edges: bool | tuple[bool, bool] = True

Also, what would happen and what do we want to happen if the center row/column of an array is nan?

Comment on lines +309 to +313
def matrix_layout(self) -> NDArray:
"""Returns a 2D array showing the layout of the matrices returned by
`find_grid`."""
n_regions = self.v_split * self.h_split
return np.reshape(np.arange(n_regions), (self.v_split, self.h_split))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be nice to make this a property rather than a function. See 33905ee for suggestion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants