Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter mask #597

Merged
merged 18 commits into from
Oct 11, 2023
Merged

Filter mask #597

merged 18 commits into from
Oct 11, 2023

Conversation

Aske-Rosted
Copy link
Collaborator

adds the option of using I3 files FilterMask to implement a filter when converting files

@RasmusOrsoe
Copy link
Collaborator

Hey @Aske-Rosted thanks for this.

Could we add a check before attempting to access the filter quantities that checks it's actually there? Because if its not, the conversion will break. Here is some pseudo-code:

def _filter_mask(
        self, frame: "icetray.I3Frame", I3filters: List[str]
    ) -> bool:
        """Check if specified condition(s) are met.

        Args:
            frame: I3Frame to check.
            I3filters: List of I3Filters to check for pass.
        """
        if "FilterMask" in frame:
              for filter in I3filters:
                  if filter in frame["FilterMask"]:
                          if frame["FilterMask"][filter].condition_passed is False:
                              return True
                  else:
                         self.warning_once(f"Filter '{filter}' not in frame. Filter condition cannot be checked.")
              return False
       else:
             self.warning_once("FilterMask not available in file. Filter conditions cannot be checked.")

@Aske-Rosted Aske-Rosted self-assigned this Sep 25, 2023
@Aske-Rosted
Copy link
Collaborator Author

Does this look good? @RasmusOrsoe

@@ -435,6 +437,8 @@ def _extract_data(self, fileset: FileSet) -> List[OrderedDict]:
continue
if self._skip_frame(frame):
continue
if self._filter_mask(frame, self._I3filters):
Copy link
Collaborator

@RasmusOrsoe RasmusOrsoe Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._filter_mask will return False by default if "FilterMask" not in frame. That means that everyone will be unable to extract i3frames without this field. That's not good!

You can resolve this by either

A) Make _filter_mask return True by default in these cases
B) change if self._filter_mask(frame, self._I3filters) to if self._filter_mask(frame, self._I3filters) & len(self._I3filters) > 0. This way, the frames are only skipped if the user wants to filter out frames this way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should very much work such that a True return from the _filter_mask call will skip the frame (through a continue call) while a False return will move on to include the frame. I agree this is a little counter intuitive and should probably be specified in the description.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I was fooled. You are right!

Copy link
Collaborator

@RasmusOrsoe RasmusOrsoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Almost there. Left two comments

@@ -566,3 +570,27 @@ def _skip_frame(self, frame: "icetray.I3Frame") -> bool:
if frame["I3EventHeader"].sub_event_stream == "NullSplit":
return True
return False

def _filter_mask(
self, frame: "icetray.I3Frame", I3filters: List[str]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get rid of the codeclimate error if you let _filter_mask access the i3filters inside the function, instead of passing them as argument.

@Aske-Rosted
Copy link
Collaborator Author

We should probably consider what is the more relevant standard behavior.

Passing all of the included filters is required to keep the frame. or passing 1 of the included filters means keeping the frame.
Also adding the possibility to switch between the two might be necessary.

in the current version it requires passing all the included filters.

@RasmusOrsoe
Copy link
Collaborator

RasmusOrsoe commented Sep 25, 2023

We should probably consider what is the more relevant standard behavior.

Passing all of the included filters is required to keep the frame. or passing 1 of the included filters means keeping the frame. Also adding the possibility to switch between the two might be necessary.

in the current version it requires passing all the included filters.

Yes! I think the most elegant solution would be to create a class like so:

from abc import abstractmethod
from graphnet.utilities.logging import Logger
from typing import List

class I3Filter(Logger):
    """A generic Class that defines I3Filter modules."""

    @abstractmethod
    def _skip_frame(self, frame: "icetray.I3Frame") -> bool:
        """A function that defines if the frame should be skipped.
            If the function returns True, the frame is skipped."""
        
    def __call__(self, frame: "icetray.I3Frame") -> bool:
        passing_flag = self._skip_frame(frame)
        try:
            assert isinstance(passing_flag, bool)
        except AssertionError as e:
            self.warning(f"""{self.__class__.__name__} returns an output 
                         of type {type(passing_flag)}. Expected bool.""")
            raise e
        return passing_flag

Our current _skip_frame function in DataConverter (which is essentially also a filter but for NullSplit frames) then becomes a default filter called NullSplitI3Filter:

class NullSplitI3Filter(I3Filter):
    def _skip_frame(self, frame: "icetray.I3Frame") -> bool:
        """Skips all null split frames.

        Args:
            frame: I3Frame to check.

        Returns:
            True if frame is a null split frame, else False.
        """
        if frame["I3EventHeader"].sub_event_stream == "NullSplit":
            return True
        return False

and your new filter based on filtermasks could look like so:


class AskeFilter(I3Filter):
    def __init__(self, filter_masks: List[str]):
        """ A helpful docstring"""
        self._filter_masks = filter_masks
        
    def _skip_frame(self, frame: "icetray.I3Frame") -> bool:
        if "FilterMask" not in frame:
            self.warning_once(
                "FilterMask not found in frame. Skipping filter checks."
            )
            return False
        for filter in self.filter_masks:
            if filter not in frame["FilterMask"]:
                self.warning_once(
                    f"Filter {filter} not found in frame. Skipping."
                )
                continue
            if frame["FilterMask"][filter].condition_passed is False:
                return True
        return False

We can then adjust the arguments to DataConverter like so:

def __init__(
        self,
        extractors: List[I3Extractor],
        outdir: str,
        gcd_rescue: Optional[str] = None,
        *,
        nb_files_to_batch: Optional[int] = None,
        sequential_batch_pattern: Optional[str] = None,
        input_file_batch_pattern: Optional[str] = None,
        workers: int = 1,
        index_column: str = "event_no",
        icetray_verbose: int = 0,
        i3_filters: Union[I3Filter, List[I3Filter]] = [NullSplitI3Filter],
    ):

We can then set the new variable as member of DataConverter:

def __init__(.... i3_filters ..):
      ...
      if isinstance(i3_filters, I3Filter): # If a single filter is passed, turn it into a list
              i3_filters = [i3_filters]
     self._i3_filters = i3_filters

and re-write the _skip_frame in DataConverter to:

def _skip_frame(self, frame: "icetray.I3Frame") -> bool:
        """Evaluates the I3Filters. 
        Events not passing the filters will be skipped

        Args:
            frame: I3Frame to check.

        Returns:
            True if the event does not pass filters, else False.
        """
        skip = False
        for filter in self._i3_filters:
            skip += filter(frame) # False + True = 1
        if skip < len(self._i3_filters): # At least one filter was not passed.
            return True 
        else: 
            return False

This way we have a lot of freedom to define filters as our needs evolve, and we wouldn't have to think to hard about default values as you mention. What do you think @Aske-Rosted?

@Aske-Rosted
Copy link
Collaborator Author

The IceTray unit test is failing due to missing due to [Errno 28] No space left on device which I don't see how is related to these changes. @RasmusOrsoe would you mind giving it a look over?

@RasmusOrsoe
Copy link
Collaborator

@Aske-Rosted I re-ran the workflow and now the disk space issue is gone. However, now a new error about missing files is there. Could you double check that you did not by accident cause this?

@Aske-Rosted
Copy link
Collaborator Author

@Aske-Rosted I re-ran the workflow and now the disk space issue is gone. However, now a new error about missing files is there. Could you double check that you did not by accident cause this?

Fixed the unit test issue.

Copy link
Collaborator

@RasmusOrsoe RasmusOrsoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great now. Just a comment on snake-case argument naming. Well done @Aske-Rosted !

@@ -107,6 +108,7 @@ def __init__(
workers: int = 1,
index_column: str = "event_no",
icetray_verbose: int = 0,
I3_Filters: List[I3Filter] = [],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use lower-case "snake-case" by convention. So the argument should be i3_filters: List[I3Filter] = []

# I3Filters (NullSplitI3Filter is always included)
self._I3Filters = [NullSplitI3Filter()] + I3_Filters

for filter in self._I3Filters:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._i3filters

@RasmusOrsoe
Copy link
Collaborator

@Aske-Rosted The IceTray disk space error is now fixed in main. Please try to update the branch and see if this doesn't fix the error.

@Aske-Rosted
Copy link
Collaborator Author

Aske-Rosted commented Oct 5, 2023

@Aske-Rosted The IceTray disk space error is now fixed in main. Please try to update the branch and see if this doesn't fix the error.

hmm new unit test errors and I do not really see how they are related to this change... any thoughts @RasmusOrsoe?

Copy link
Collaborator

@RasmusOrsoe RasmusOrsoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@Aske-Rosted Aske-Rosted merged commit 5801d62 into graphnet-team:main Oct 11, 2023
12 checks passed
@Aske-Rosted Aske-Rosted deleted the filter_mask branch October 11, 2023 00:23
RasmusOrsoe pushed a commit to RasmusOrsoe/graphnet that referenced this pull request Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants