Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PoC] First implementation Worker Availability #597

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

GPaoloni
Copy link
Collaborator

@GPaoloni GPaoloni commented Jan 12, 2022

Description

This PR is a proof of concept of the Option 1 for worker's availability, as described in https://benetech.app.box.com/notes/902771185700.

This PR is related to techmatters/serverless#181.

The code that handles the status changes is implemented as an invisible Redux connected React component, because each time the worker attributes or activity changes, Flex updates it's Redux state, giving to us the ability to react to this changes without attaching new event listeners (other than just the props differing, which will trigger a component update and useEffect re-run if corresponds). We can however try with other approaches, like using worker event listeners.

A new boolean flag keepAvailable is used in this code to determine if the worker should be set to Offline, as there are cases where we want to keep it Available until some process completes (like accepting a 2nd task when the worker clicks on "add another task" button). Some more thinking around this is required, as we may face cases where two (or more) processes are taking place simultaneously and can affect this. A possible option would be "don't start a procedure that depends on keepAvailable if it's already set, but this might make impossible scenarios like task transfer at the same time as a new task is being pulled.

Note that this implementation relies on the existance of Available and Offline activity statuses. I think this is OK because anyway this are not the status we'll show to the workers, but some others of our own implementation, but we could preserve the actual statuses that each account uses as long as we keep track of them in a map like
{ Available: <the sid of Available activity in whatever language it is written>, Offline: <...> }

Checklist

Verification steps

Keep an eye to the worker status activity at every time while doing this. On top of that, you can open the browser console and find the logs prefixed with >>>> that are intentionally included here, to check when the calculation is run and what the values are.

  • Run this branch locally with npm run dev.
  • Try chat based contacts and manual pull for extra tasks.

Required work on top of this:

  • Wrap the behavior under a feature flag.

  • A new boolean flag keepAvailable is used in this code to determine if the worker should be set to Offline, as there are cases where we want to keep it Available until some process completes (like accepting a 2nd task when the worker clicks on "add another task" button). Some more thinking around this is required, as we may face cases where two (or more) processes are taking place simultaneously and can affect this. A possible option would be "don't start a procedure that depends on keepAvailable if it's already set, but this might make impossible scenarios like task transfer at the same time as a new task is being pulled.

    The above statement needs to be resolved, and some design challenges arise on this topic. Do we want to go with the easist option of "blocking concurrent tasks that depends on this flag", or design a more advanced system that tracks which processes are making use of this flag? The second one might involve more effort, and I'm not sure that task attributes will suffice, so we'll may need to rely on Sync documents for this. Happy to explain in more detail what are my concerns here.

  • Proper handling for transferred tasks to a particular worker (direct transfers). This should be handled in the corresponding Serverless function (set keepAvailable) when the transferred task is created and in Flex UI (unset keepAvailable) when the transferred task is accepted or rejected.

  • Proper handling for offline contacts. This should be handled in the corresponding Serverless function (set & unset keepAvailable). Already done in [PoC] First implementation Worker Availability serverless#181.

  • Define a new flag to indicate when a worker is "ready" to accept tasks vs when it's not, probably using task attributes (something like isReady). This flag should be taken into consideration to set Available to allow tasks come in, but it's probably not required to be part of the task assignment rules. This item is bound to re-implementing the below UI parts where a worker can set it's status.

  • Define if we want more statuses other than "Ready" and "Not Ready" (analogous to Available and Offline).

  • UI (this will partially depend on above item):

    • Top right corner of worker view.

      Screenshot from 2022-01-12 15-34-01

    • No tasks view of worker's view.

      Screenshot from 2022-01-12 15-34-54

    • Workers statuses of supervisor's view.

      Screenshot from 2022-01-12 15-36-27

    • Workers directory: we need to re-do the list of the workers available to transfer a task. The reason is that the transfer button is disabled when the worker is offline. We'll need to filter only the "Ready" ones, and give the direct transfer a custom behavior: set the keepAvailable flag to true for the target worker, then execute the transfer logic. We can however leave this for later and implement a first version of this PoC with direct transfers disabled.

  • How can we deal with the lost information of worker's reports in Insights? Custom timestamps? Fullstory?

@stephenhand
Copy link
Collaborator

We should put some thought into how we robustly & deterministically test this if we implement it. It seems like there's lots of corner cases that could trip things up if we aren't careful. Would unit tests be sufficient or do we need to test some scenarios running against a real Twilio backend?

@GPaoloni
Copy link
Collaborator Author

@stephenhand Unit tests will give us confidence on the behavior of this assuming that everything is working fine with the Twilio system. I plan to do that if this moves towards a final implementation. It would also be awesome to have some automated tests that performs some real interactions with the Twilio APIs, but I imagine that to be part of a much larger project than just this piece.

Copy link
Contributor

@murilovmachado murilovmachado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I've tried locally and I've seen the availability being handled automatically.

One question: Right now, in some scenarios, the user cannot set herself as unavailable. In the final implementation, I suppose the user will be able to set herself as unavailable, right?

@GPaoloni
Copy link
Collaborator Author

@murilovmachado Yes, absolutely. But for that to happen, we need to change the "Availability dropdown" and handle the status changes a bit differently. The worker will then set it's status as "Ready" or "Not Ready", and the internals will be automatically handled.

@dee-luo
Copy link
Contributor

dee-luo commented Jan 21, 2022

Some answers to the open questions above:

  • We will want to allow other statuses from the counselor's perspective (similar to how Twilio allows you to set different states at the moment). Each status will map to either "Ready" or "Not Ready" on the backend.
  • Transfers will be important to have, so I do think it will be important to do the worker directory list.
  • For Insights: I don't think FullStory will work unfortunately, since Full Story does not really do much time tracking. I think the best approach is for us to log timestamps of activity changes for now, so we have the data if we need to generate a report.
  • I'm not sure I understand the issue with concurrent independent processes that depend on the keepAvailable flag. Does it matter if the flag is already set? I do think we want to still allow the concurrent processes to happen instead of blocking them, but want to understand what the implications are.

@GPaoloni
Copy link
Collaborator Author

GPaoloni commented Jan 21, 2022

I'm not sure I understand the issue with concurrent independent processes that depend on the keepAvailable flag. Does it matter if the flag is already set? I do think we want to still allow the concurrent processes to happen instead of blocking them, but want to understand what the implications are.

@dee-luo
Based on this related piece https://github.com/techmatters/serverless/pull/181/files.
Imagine the following scenario:

  • Someone starts submitting an offline contact in my behalf while I'm working (A).
  • In parallel, I'm submitting an offline contact in my own behalf (B).
  • The person submits this report. Here the flag is set to keepAvailable in my worker attributes, and saves my previousAttributes(A) (that is, the same but without keepAvailable and waitingOfflineContact).
  • Almost at the same time, I submit mine. keepAvailable is already set for me, so the process continues, but here previousAttributes(B) will contain keepAvailable set to true (that was just set above).
  • A completes it's flow, setting my worker attributes to previousAttributes(A).
  • B completes it's flow, setting my worker attributes to previousAttributes(B). Now my worker is stuck with keepAvailable set to true.
    Also there is a window of time between A and B where this flags are not set, when they should, because of B has started.
    Now that I'm thinking, this potential bug already exists for the waitingOfflineContact flag, something we need to address I think.
    But same kind of issues may arise if two actions using keepAvailable takes place at almost the same time, which is imo more likely than the same for waitingOfflineContact.

A possible solution to this two issues: keep track of the entire list of processes using this flags, and only clean them when all of this are done. For this, as I've said, we may need to use Sync. What do the rest of the eng team think? @murilovmachado @stephenhand @nickhurlburt

Copy link
Collaborator

@stephenhand stephenhand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants