-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global Filtered Topic (Suppressed Alarms) #10
Comments
We may actually want a separate global topic per each control room (as each control room may want to filter out things differently). For example, we may want a mask topic for the following control rooms: CEBAF Alternatively, each control room could get a separate instance of the alarm system (but with lots of overlap/duplicate alarms) |
In progress: https://github.com/JeffersonLab/alarms-filter Streams app that provides ability to configure a set of output topics with custom filters applied to the active-alarms topic. This allows any consumer to share the filtered topic (as opposed to a consumer local filter, which isn't shared) |
We might want to consider having a isMaskable attribute on alarms. A scenario I'm thinking of is something like the radcon tritium alarms that were in Hall A a while back. Michele had to put them into "SITE" so that they wouldn't be disabled when Hall A wasn't running because they had to be armed all the time, not just when Hall A was taking beam. Maybe we can up with other options, but the ones I see now are. 1) Put things "out-of-place" to avoid them being masked or 2) allow items to be excluded from the facility mask. |
Terminology note: ANSI/ISA 18.2-2016 defines three alarm states in which an alarm is "turned off"
I believe we use the term shelve and out-of-service as intended by the standard, but we casually use many names for "suppressed by design" including:
This is likely because the word "suppressed" is fairly generic and in fact occurs in the definition of shelved and out-of-service (the distinction there is shelved is temporary / time limited whereas out-of-service is indefinite. The third category "suppressed by design" is a sort of catch-all that we're using for the scenario where a portion of the machine is turned off because we're not using it so we don't want to see alarms from that part of the machine. We may need to clarify which terms we are going to use, I believe the distinction on who does the "turning off" isn't very useful as it could always be an operator. Both Out-of-Service and our "turn off a portion of the machine" use case are indefinite, so they actually could be one in the same and we could consolidate to just two distinct off states (indefinite vs temporary/timed/shelved). However, it might be useful to make a distinction between out-of-service (for maintenance) and "turned-off" (not needed for program). The distinction is being able to see what is broken vs what is just not needed - though not needed is a super set of broken as it could mean anything - too many broken items, not enough money in the budget, not compatible with current machine configuration, undergoing an upgrade, etc. Another factor that could be used to make a distinction is whether you can easily turn off an entire group of alarms all-at-once vs one-at-a-time. It's tempting to say that is another difference between out-of-service and "Turn off", but it is possible in the future users would like to use wildcard expressions / grouping filters to select any suppression action. |
Note: it is possible for an alarm to be in two states at once: (1) flagged as out-of-service (broken) OR shelved, (2) filtered out of view because it is located in a portion of the machine that isn't part of the program It is fine that these combinations are possible, even desirable as what is part of the program probably should be tracked separately from what is broken (and definitely separately from what is shelved). Depending on how we handle out-of-service we could actually be in all three states at once (which is fine). Separating what is broken from what is shelved is not as critical, but not a problem either (unless ops abuses it in lieu of creating temporary shelved items). Since an alarm should rarely be out for maintenance we could add a boolean to the registered-alarms topic. Alternatively, we currently have an indefinite option on shelving, which could be used as "out-of-service". The GUI calls this disabled now. We could rename the shelved-alarms topic to suppressed-alarms to clarify that both indefinite and timed suppressed alarms are captured there if we take the definition of shelved to mean only timed suppressions. Or just document clearly that we use an indefinite shelving to mean out-of-service. |
I forgot to mention "suppressed by", which is a reference to a parent alarm that suppresses a given alarm in a hierarchy. Also on delays. Given these five suppression modes how about we use the following definitions: Alarm Suppression States
|
We need to determine if we must independently track each of these alarm suppression states (should they be mutually exclusive?) because they may overlap in all sorts of ways and at transitions the correct effective suppression state must be determined ideally without coupling the various mechanisms that suppress alarms. There is a calculated "effective" suppression state, but it might be best to calculate it by combining the independent states to avoid confusion and provide a clear audit trail. If any item on the list changes then all suppression rules must be re-applied in order of precedence (starting at 1). For example if an alarm is removed from a filter all the other suppression states must be re-evaluated, meaning we need to store state information about what other rules would have been in effect if the filter hadn't been in effect - such as continuous shelving. It is possible to create a "suppressed-alarms" topic, with message key of both alarm name and suppression reason such that we can store all of the possible suppression state in one topic. We could create a new topic alarm_state that stores the calculated effective state (could even compute final effective state considering acknowledgements and active-alarms too). Or clients can do the calculation. If we have a "suppressed-alarms" topic, our current filter-app prototype will need to change as it writes to a separate topic. |
Filter flow might look like:
Each suppression app would update suppressed-alarms topic
States: NORMAL <-> ABNORMAL (active) ACKNOWLEDGED <-> UNACKNOWLEDGED SUPPRESSED <-> UNSUPPRESSED
One app might be able to do the whole process? |
I think you're on the right track...layering the various "suppressions" That way, if there is any other type comes, up it would be easier to add. Can this also apply to "calc/rules based alarms?" Isn't it just another form of alarm/don't alarm? I think you're definitions are good. |
Yeah, CALC alarms would likely go in that flow as well - probably before the disabled-app. The delay-app would also likely handle off-delays as well (not just on-delays). |
Layering these things makes more and more sense... |
Especially since we're not only talking about global suppression, but instead individual and groups of alarms... |
If we create a bunch of separate apps it certainly is flexible, but it'll be costly and unwieldy as well (lots of moving parts and lots of duplicate work). It might make more sense to divide suppression in general into two pieces:
Probably consolidate all of this into the alarm-filters project and rename it alarm-suppressors instead. See: JeffersonLab/alarms-filter#1 |
It would be useful to have a global mask topic that is a filtered version of the active-alarms topic, but filtered by global filter rules. A Kafka Streams app could do the filtering. A command topic would also likely be needed to instruct the Streams app what to filter. Currently each consumer (client) can do local filtering based on criteria such as category or location, but this applies only to the local client (not all clients).
The text was updated successfully, but these errors were encountered: