Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: slicing mixin class #132

Merged

Conversation

DaniBodor
Copy link
Member

@DaniBodor DaniBodor commented Dec 1, 2023

This is the first step of #129. I decided to make separate PRs for each of those so that each can stay small and comprehensible

This PR refactors the current slicing mixin by (in order of commits):

  1. creating a folder structure for all mixin modules and using explicit imports (in the modified modules)
  2. fixing some minor errors, rewording errors, etc (see commit message for details)
    • I did some stupid stuff re labeling here. This is solved in later commit (too complex to rebase properly)
  3. force expected behavior for start_inclusive and end_inclusive in slice_by_index
  4. make labeling part of parent method
  5. create test files for this
    • the proper test fails because __eq__ and __len__ are not yet correctly implemented)
    • there's a temporary notebook that can be used to visually check that what we think is happening is indeed happening
  6. make slice_by_time result in expected behavior

TO DO:

  • add docstrings where needed

also use absolute imports
- Replace `if not any((start, end))` by `if start is None and end is None`,
becase it was picking up 0s as well as `None`s.
- Remove `_check_time_sorted` method, as it was only used once.
- Move its functionality to `select_by_time`, as slicing by index should be
ok for unsorted time stamps.
- Some style changes (white lines, order of lines, eror message phrasing, ...).
- Do labeling in parent method
@DaniBodor DaniBodor changed the base branch from main to 181_refactor_sequence_psomhorst December 1, 2023 11:02
@DaniBodor DaniBodor marked this pull request as ready for review December 1, 2023 11:22
@DaniBodor DaniBodor force-pushed the 129a_slicing_mixin_dbodor branch 2 times, most recently from c0308cf to 68750f6 Compare December 1, 2023 11:44
@DaniBodor DaniBodor requested a review from psomhorst December 1, 2023 11:49
@DaniBodor DaniBodor mentioned this pull request Dec 1, 2023
4 tasks
@DaniBodor DaniBodor changed the title 129a slicing mixin dbodor refactor: slicing mixin class Dec 1, 2023
@psomhorst
Copy link
Contributor

psomhorst commented Dec 1, 2023

@DaniBodor I added some comments. (I see that they're not all aligned with the proper lines, VSCode's comment feature has some bugs, I see).

When I wrote this code I was assuming you'd always need an start and end point, but None will just work as intended when slicing.

You moved the generation of the label to the abstract method. This value is never returns or saves anywhere. Maybe consider writing a new function for label generation?

Removing start_inclusive and end_inclusive makes sense for SelectByIndex. However, for SelectByTime this would require a different implementation for the TimeIndexer when selecting a single time point (rather than a slice), and makes it hard to split, change, and merge sequences (a use case I would like to consider enabling).

You removed start_inclusive and end_inclusive as arguments, but they're still used in de code when calling the select_by_...() methods.

As for testing: does it make sense to create a class specific for testing that is purpose built to test slicing? That makes it easier to rule out bugs. Using a complicated class like DraegerEITData which is not finished yet makes debugging harder. I did the same for testing Variants (see VariantSubA and VariantSubB).

@DaniBodor DaniBodor force-pushed the 129a_slicing_mixin_dbodor branch from 68750f6 to 0842d22 Compare December 1, 2023 13:36
@DaniBodor DaniBodor marked this pull request as draft December 1, 2023 13:48
@DaniBodor DaniBodor force-pushed the 129a_slicing_mixin_dbodor branch 2 times, most recently from 65d8f3f to ca4f005 Compare December 1, 2023 14:09
@DaniBodor
Copy link
Member Author

@DaniBodor I added some comments. (I see that they're not all aligned with the proper lines, VSCode's comment feature has some bugs, I see).

Thanks for the quick review!

When I wrote this code I was assuming you'd always need an start and end point, but None will just work as intended when slicing.

It does, but then the way I now implemented labeling become more complex, so I made it explicit.

You moved the generation of the label to the abstract method. This value is never returns or saves anywhere. Maybe consider writing a new function for label generation?

this should be resolved now

Removing start_inclusive and end_inclusive makes sense for SelectByIndex. However, for SelectByTime this would require a different implementation for the TimeIndexer when selecting a single time point (rather than a slice), and makes it hard to split, change, and merge sequences (a use case I would like to consider enabling).

Agreed. I changed it back so that index slicing does not have the option, but time slicing does. The end behavior used to be modified by -1, but I dont think that is correct and I changed it. Can you look at it and make sure that this is correct now (see also test notebook)

You removed start_inclusive and end_inclusive as arguments, but they're still used in de code when calling the select_by_...() methods.

fixed

As for testing: does it make sense to create a class specific for testing that is purpose built to test slicing? That makes it easier to rule out bugs. Using a complicated class like DraegerEITData which is not finished yet makes debugging harder. I did the same for testing Variants (see VariantSubA and VariantSubB).

For now I added the notebook that I was using myself for checking that things are correct. This will be removed when proper tests are implemented. In think that at least temporarily solves the problem as well, until everything works properly. Not as fancy, but given that this is a temporary problem (or do you mean as a permanent thing?), this quick solution I think should be ok.

@DaniBodor DaniBodor requested a review from psomhorst December 1, 2023 14:29
@DaniBodor DaniBodor marked this pull request as ready for review December 1, 2023 14:29
@DaniBodor
Copy link
Member Author

By the way, while reviewing, if you do your comments as "start a review"/"add review comment" rather than "add single comment", then I only get notified when your review is done. This way, I don't get pinged for each comment individually :)

@psomhorst
Copy link
Contributor

By the way, while reviewing, if you do your comments as "start a review"/"add review comment" rather than "add single comment", then I only get notified when your review is done.

Yes, sorry. I added the comments from VS Code, but didn't figure it would ping you for every single one.

@psomhorst
Copy link
Contributor

@DaniBodor The current implementation does not coincide with the discussion we had on slicing by time. As I understood it, we agreed that when slicing by time, the start and end of the time axis should be present inside the time range that is selected if start_inclusive/end_inclusive are True.

E.g. if we slice from start = 10.1 with start_inclusive = True, the resulting time axis should always include the number 10.1. So if the time is [9.85, 10, 10.15, 10.30], the slicing should return [10, 10.15, 10.30]. With start_inclusive = False it should return [10.15, 10.30]. If we slice with start = 10.15 the result should always be [10.15, 10.30], irrespective of the value of start_inclusive.

It could be I'm misunderstanding what we agreed on previously when it comes to slicing time.

Copy link
Contributor

@psomhorst psomhorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code seems to largely function as expected. I'm not quite sure all expectations align, though. Maybe we should re-discuss how time slicing should work, and what it means exactly, and write down some test cases and expected results.

eitprocessing/mixins/slicing.py Outdated Show resolved Hide resolved
eitprocessing/mixins/slicing.py Outdated Show resolved Hide resolved
Now that the slicing functions are "hidden", it doesn't make sense to
have `start_inclusive` and `end_inclusive` as options:
`start_inclusive` is is forced `True` and `end_inclusive` is forced `False`,
which aligns with the behavior of slicing lists, etc.
testing still does not work well, because `__eq__` and `__len__` have
not been implemented (properly).
a temporary notebook has been added for visual testing. this will be
deleted when proper testing works.
@DaniBodor DaniBodor force-pushed the 129a_slicing_mixin_dbodor branch from ca4f005 to 3dd8f7b Compare December 6, 2023 11:47
@DaniBodor
Copy link
Member Author

DaniBodor commented Dec 6, 2023

@DaniBodor The current implementation does not coincide with the discussion we had on slicing by time. As I understood it, we agreed that when slicing by time, the start and end of the time axis should be present inside the time range that is selected if start_inclusive/end_inclusive are True.

E.g. if we slice from start = 10.1 with start_inclusive = True, the resulting time axis should always include the number 10.1. So if the time is [9.85, 10, 10.15, 10.30], the slicing should return [10, 10.15, 10.30]. With start_inclusive = False it should return [10.15, 10.30]. If we slice with start = 10.15 the result should always be [10.15, 10.30], irrespective of the value of start_inclusive.

It could be I'm misunderstanding what we agreed on previously when it comes to slicing time.

What you say makes sense. I think it was me who misunderstood the usage of start_inclusive and end_inclusive. Should be fixed now.
What do you think should be the defaults for this?

@DaniBodor DaniBodor force-pushed the 129a_slicing_mixin_dbodor branch from 81ab152 to a4515dd Compare December 6, 2023 12:02
@DaniBodor
Copy link
Member Author

DaniBodor commented Dec 6, 2023

This code seems to largely function as expected. I'm not quite sure all expectations align, though. Maybe we should re-discuss how time slicing should work, and what it means exactly, and write down some test cases and expected results.

I think I was thinking about it in an illogical way previously. Please see if now it makes sense to you and our expectations are aligned. If still no, then we can maybe discuss it on Friday. I anyway wanted to ask you then what the TimeIndexer class does.

@DaniBodor DaniBodor requested a review from psomhorst December 6, 2023 12:48
- add docstring
- rename attributes
- rearrange methods so that code follows
order of implementation
@DaniBodor DaniBodor force-pushed the 129a_slicing_mixin_dbodor branch from e216214 to 6dc006b Compare December 8, 2023 11:11
@DaniBodor DaniBodor merged commit 5ae7437 into 181_refactor_sequence_psomhorst Dec 8, 2023
@DaniBodor DaniBodor deleted the 129a_slicing_mixin_dbodor branch December 8, 2023 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants