Global Filtered Topic (Suppressed Alarms) #10

slominskir · 2021-02-18T21:13:36Z

It would be useful to have a global mask topic that is a filtered version of the active-alarms topic, but filtered by global filter rules. A Kafka Streams app could do the filtering. A command topic would also likely be needed to instruct the Streams app what to filter. Currently each consumer (client) can do local filtering based on criteria such as category or location, but this applies only to the local client (not all clients).

slominskir · 2021-02-18T21:22:53Z

We may actually want a separate global topic per each control room (as each control room may want to filter out things differently). For example, we may want a mask topic for the following control rooms:

CEBAF
LERF
CHL
UITF

Alternatively, each control room could get a separate instance of the alarm system (but with lots of overlap/duplicate alarms)

slominskir · 2021-02-18T23:16:53Z

In progress: https://github.com/JeffersonLab/alarms-filter

Streams app that provides ability to configure a set of output topics with custom filters applied to the active-alarms topic. This allows any consumer to share the filtered topic (as opposed to a consumer local filter, which isn't shared)

theojlab · 2021-02-19T17:40:49Z

We might want to consider having a isMaskable attribute on alarms. A scenario I'm thinking of is something like the radcon tritium alarms that were in Hall A a while back. Michele had to put them into "SITE" so that they wouldn't be disabled when Hall A wasn't running because they had to be armed all the time, not just when Hall A was taking beam. Maybe we can up with other options, but the ones I see now are. 1) Put things "out-of-place" to avoid them being masked or 2) allow items to be excluded from the facility mask.

slominskir · 2021-03-15T11:46:14Z

Terminology note:

ANSI/ISA 18.2-2016 defines three alarm states in which an alarm is "turned off"

Suppressed (by design): "prevent the annunciation of the alarm to the operator when the alarm is active"; "EXAMPLE: shelve, suppress by design, remove from service"
Shelved: "temporarily suppress an alarm, initiated by the operator, with engineering controls (e.g., time-limited) that unsuppress the alarm"
Out-of-Service: "state of an alarm during which the alarm indication is indefinitely suppressed, typically manually, for reasons such as maintenance"

I believe we use the term shelve and out-of-service as intended by the standard, but we casually use many names for "suppressed by design" including:

Filtered
Masked
Disabled
Off

This is likely because the word "suppressed" is fairly generic and in fact occurs in the definition of shelved and out-of-service (the distinction there is shelved is temporary / time limited whereas out-of-service is indefinite. The third category "suppressed by design" is a sort of catch-all that we're using for the scenario where a portion of the machine is turned off because we're not using it so we don't want to see alarms from that part of the machine. We may need to clarify which terms we are going to use,

I believe the distinction on who does the "turning off" isn't very useful as it could always be an operator.

Both Out-of-Service and our "turn off a portion of the machine" use case are indefinite, so they actually could be one in the same and we could consolidate to just two distinct off states (indefinite vs temporary/timed/shelved). However, it might be useful to make a distinction between out-of-service (for maintenance) and "turned-off" (not needed for program). The distinction is being able to see what is broken vs what is just not needed - though not needed is a super set of broken as it could mean anything - too many broken items, not enough money in the budget, not compatible with current machine configuration, undergoing an upgrade, etc.

Another factor that could be used to make a distinction is whether you can easily turn off an entire group of alarms all-at-once vs one-at-a-time. It's tempting to say that is another difference between out-of-service and "Turn off", but it is possible in the future users would like to use wildcard expressions / grouping filters to select any suppression action.

slominskir · 2021-03-15T13:27:52Z

Note: it is possible for an alarm to be in two states at once: (1) flagged as out-of-service (broken) OR shelved, (2) filtered out of view because it is located in a portion of the machine that isn't part of the program

It is fine that these combinations are possible, even desirable as what is part of the program probably should be tracked separately from what is broken (and definitely separately from what is shelved).

Depending on how we handle out-of-service we could actually be in all three states at once (which is fine). Separating what is broken from what is shelved is not as critical, but not a problem either (unless ops abuses it in lieu of creating temporary shelved items). Since an alarm should rarely be out for maintenance we could add a boolean to the registered-alarms topic. Alternatively, we currently have an indefinite option on shelving, which could be used as "out-of-service". The GUI calls this disabled now. We could rename the shelved-alarms topic to suppressed-alarms to clarify that both indefinite and timed suppressed alarms are captured there if we take the definition of shelved to mean only timed suppressions. Or just document clearly that we use an indefinite shelving to mean out-of-service.

slominskir · 2021-03-15T15:53:40Z

I forgot to mention "suppressed by", which is a reference to a parent alarm that suppresses a given alarm in a hierarchy. Also on delays. Given these five suppression modes how about we use the following definitions:

Alarm Suppression States

Precedence	Name	Duration	Definition
1	Disabled	Indefinite	A broken alarm can be flagged as out-of-service
2	Filtered	Indefinite	An alarm can be "suppressed by design" - generally a group of alarms are filtered out when not needed for the current machine program
3	Masked	Only while parent alarm is active	An alarm can be suppressed by a parent alarm to minimize confusion during an alarm flood and build an alarm hierarchy
4	Delayed	Short with expiration	An alarm with an on-delay is temporarily suppressed to minimize fleeting/chattering
5	Shelved	Short with expiration	A nuisance alarm can be temporarily shelved with a short expiration date

slominskir · 2021-03-15T17:02:25Z

We need to determine if we must independently track each of these alarm suppression states (should they be mutually exclusive?) because they may overlap in all sorts of ways and at transitions the correct effective suppression state must be determined ideally without coupling the various mechanisms that suppress alarms. There is a calculated "effective" suppression state, but it might be best to calculate it by combining the independent states to avoid confusion and provide a clear audit trail.

If any item on the list changes then all suppression rules must be re-applied in order of precedence (starting at 1). For example if an alarm is removed from a filter all the other suppression states must be re-evaluated, meaning we need to store state information about what other rules would have been in effect if the filter hadn't been in effect - such as continuous shelving.

It is possible to create a "suppressed-alarms" topic, with message key of both alarm name and suppression reason such that we can store all of the possible suppression state in one topic. We could create a new topic alarm_state that stores the calculated effective state (could even compute final effective state considering acknowledgements and active-alarms too). Or clients can do the calculation. If we have a "suppressed-alarms" topic, our current filter-app prototype will need to change as it writes to a separate topic.

slominskir · 2021-03-15T21:44:43Z

Filter flow might look like:

Source -> [active-alarms] -> disabled-app -> [non-disabled-alarms] -> filter-app -> [non-filtered-alarms] -> mask-app -> [non-masked-alarms] -> delay-app -> [non-delayed-alarms] -> state-calculator -> [operator-alarms]

Each suppression app would update suppressed-alarms topic

state-calculator computes effective state
- honors acknowledgements
- honors suppression precedence
- honors active state
- honors registered alarms (optional) - do you want to see state "Normal" alarms listed?

States:

NORMAL <-> ABNORMAL (active)

ACKNOWLEDGED <-> UNACKNOWLEDGED

SUPPRESSED <-> UNSUPPRESSED

plus combinations and variants of suppressed

One app might be able to do the whole process?

michelejoyce · 2021-03-16T15:15:09Z

I think you're on the right track...layering the various "suppressions" That way, if there is any other type comes, up it would be easier to add.

Can this also apply to "calc/rules based alarms?" Isn't it just another form of alarm/don't alarm?

I think you're definitions are good.

slominskir · 2021-03-16T15:46:40Z

Yeah, CALC alarms would likely go in that flow as well - probably before the disabled-app. The delay-app would also likely handle off-delays as well (not just on-delays).

michelejoyce · 2021-03-16T17:26:02Z

Layering these things makes more and more sense...
Will it be onerous?

michelejoyce · 2021-03-16T17:26:49Z

Especially since we're not only talking about global suppression, but instead individual and groups of alarms...

slominskir · 2021-03-17T15:17:39Z

If we create a bunch of separate apps it certainly is flexible, but it'll be costly and unwieldy as well (lots of moving parts and lots of duplicate work). It might make more sense to divide suppression in general into two pieces:

Suppression Rule Processor - App responsible for keeping the suppressed-alarms topic up-to-date based on registered-alarms and active-alarms and filter-commands topics
Active Alarm Suppressor - App responsible for actually creating a new output topic with suppressed active-alarms

Probably consolidate all of this into the alarm-filters project and rename it alarm-suppressors instead. See: JeffersonLab/alarms-filter#1

slominskir mentioned this issue Feb 18, 2021

Shelve Group of Alarms all-at-once #7

Closed

slominskir changed the title ~~Global Mask Topic~~ Global Mask Topic (Disabled Alarms) Feb 23, 2021

slominskir mentioned this issue Feb 23, 2021

Support Alarming Delay #14

Open

slominskir changed the title ~~Global Mask Topic (Disabled Alarms)~~ Global Mask Topic (Suppressed Alarms) Mar 15, 2021

slominskir changed the title ~~Global Mask Topic (Suppressed Alarms)~~ Global Filtered Topic (Suppressed Alarms) Mar 15, 2021

michelejoyce closed this as completed Mar 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Global Filtered Topic (Suppressed Alarms) #10

Global Filtered Topic (Suppressed Alarms) #10

slominskir commented Feb 18, 2021

slominskir commented Feb 18, 2021 •

edited

Loading

slominskir commented Feb 18, 2021

theojlab commented Feb 19, 2021

slominskir commented Mar 15, 2021 •

edited

Loading

slominskir commented Mar 15, 2021 •

edited

Loading

slominskir commented Mar 15, 2021 •

edited

Loading

slominskir commented Mar 15, 2021

slominskir commented Mar 15, 2021 •

edited

Loading

michelejoyce commented Mar 16, 2021

slominskir commented Mar 16, 2021

michelejoyce commented Mar 16, 2021

michelejoyce commented Mar 16, 2021

slominskir commented Mar 17, 2021 •

edited

Loading

Global Filtered Topic (Suppressed Alarms) #10

Global Filtered Topic (Suppressed Alarms) #10

Comments

slominskir commented Feb 18, 2021

slominskir commented Feb 18, 2021 • edited Loading

slominskir commented Feb 18, 2021

theojlab commented Feb 19, 2021

slominskir commented Mar 15, 2021 • edited Loading

slominskir commented Mar 15, 2021 • edited Loading

slominskir commented Mar 15, 2021 • edited Loading

slominskir commented Mar 15, 2021

slominskir commented Mar 15, 2021 • edited Loading

michelejoyce commented Mar 16, 2021

slominskir commented Mar 16, 2021

michelejoyce commented Mar 16, 2021

michelejoyce commented Mar 16, 2021

slominskir commented Mar 17, 2021 • edited Loading

slominskir commented Feb 18, 2021 •

edited

Loading

slominskir commented Mar 15, 2021 •

edited

Loading

slominskir commented Mar 15, 2021 •

edited

Loading

slominskir commented Mar 15, 2021 •

edited

Loading

slominskir commented Mar 15, 2021 •

edited

Loading

slominskir commented Mar 17, 2021 •

edited

Loading