-
-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent duplicate sibling contact capture #9601
Comments
OverviewFirst I just want to clarify the space and existing problem-set to clarify exactly what should be addressed in this issue. I think there are three separate, but related, problems to solve under the heading of "Duplicate prevention":
DetailsWith that summary in mind, we can dig into the details of how to specifically prevent a (potentially offline) user from creating a duplicate contact. The prototype PR provides a great starting point for this conversation. My goal here is to synthesis/generalize the details from that prototype into a design summary here that is easier for folks to understand and discuss (also including some of my own suggestions and editorializing). Once we coalesce on a particular design approach, we can return to the actual code and make that happen. ConfigurationSince different types of contacts contain different levels of data that might need to be checked for duplication, it makes sense to configure the rules for dup checking individually for each contact type. I think it would be logical to include this config in the app-settings For each contact type, we need to define the rules for what constitutes a duplicate. The most flexible way to do this would be to allow a custom function to be included in the config that accepts two contacts and returns a boolean indicating if the contacts are duplicate or not. The Levenshtein library could be in-context for this function's logic to make use of. This would allow for the dupe logic to be as complex or simple as necessary. The main downside of this function-based approach is that I guess it would be almost impossible to re-use this configuration for any kind of server-side dupe checking (in any future solution for #6363). It is not feasible from a performance perspective to run a function like this against every contact in the db. (Honestly, though, the more I consider this, the less I think we should try to optimize this config for any kind of server-side reuse... It seems like the most likely possibilities for server-side dupe checking functionality in the near future are not going to happen with Couch data, but via an external data store, e.g. DOT or Databricks) The prototype PR presents an alternative approach where, instead of a function, for each contact type we define which fields should be compared from the contact docs and which algorithm (e.g. I think more discussion about the best form of configuration would be valuable here! One thing I would love to see is a simple way to enable some "default" dupe checks (e.g. something that just validates the FunctionalityMoving on to the functionality of how this should actually work in webapp. I think the example from the PR is a solid approach where, when opening a new contact form, it also triggers a lookup of the existing sibling contacts of the new contact (via I am not sure if this is covered in the current PR functionality or not, but I think it will be important to do the same duplicate checking when editing a contact. UXAs demonstrated in the screenshot above, the current UX in the PR is to present the user with a list of the found duplicates (along with links to go to their profile page). One incremental enhancement that might make sense here would be to present the duplicates more as a proper list of contacts (or even "contact cards") with various important identifying information displayed (instead of highlighting specifically which data is being matched for the duplicate check). If a user clicks into one of the other contacts, they will loose all the data they have entered into the contact form, so we want to give them as much info as is feasible about the contacts before they navigate away from the form. Instead of including the list of contacts inline in the form page, it might be better to pop a modal containing the list. 🤔 (Either way, we should be able to use a Another consideration is what the default behavior should be if duplicate contacts are found. Should we warn the user, but still let them submit the new contact? Or, should we totally disable the submit button and prevent the new contact from being added? Ultimately, this is something that we could make configurable, but it would be good to have a simple approach in the MVP and add configuration later... @ChinHairSaintClair @fardarter Please weigh in here with anything that I have missed or mistaken or additional thoughts of considerations that you have! @garethbowen let me know what you think about this proposed approach. What other stake-holders should we pull into this conversation to make sure we can maintain momentum on this feature? |
I think this is a must. We can't assume anything about naming conventions and it's quite possible for two people at the same family to have the same name. The point of this feature is to stop a CHW doing the wrong thing accidentally, not prevent them from doing an action on purpose.
I'm really interested in seeing if we can make this workflow generic enough that it works for all report types, not just contact creation. We have so many examples of duplicates being created accidentally across all types that this would be powerful. I worry that some of the thoughts here (like Levenshtein distance, using multiple fields, etc) are over and above what's actually needed. Can we just check the exact name and family? For reports, can we just check report code, reported date, and subject? If we can simplify it enough, can it work out-of-the-box without configuration?
The ideal solution would warn about duplicates as early as possible. Some forms are very long and forcing a user to enter all the details before telling them about dupes we could have found after the first input field was complete would be a very frustrating UX. In my head this looks like a validation error on the name input with a checkbox to bypass the check, but implementing it as an enketo validation would be difficult I think? But however it's done, notifying as early as possible would be a huge win.
I think the eCHIS Kenya team would be interested in this too. |
This was my initial thoughts as well, but then I was convinced by your comment on the other issue that the "duplicate report" workflow was quite different from duplicate contacts and perhaps there is not much overlap in the configs/logic. Specifically, when detecting a "duplicate report" is is probably much less about the contents of the report than just as you said: the type of report, who it is for, and when it is submitted. Basically for reports we would be looking for other reports of the same type that were created for the same contact in a particular timeframe. These checks can happen up-front before even loading the form. All of this is pretty different from the "duplicate contact" flow where the most important thing is the content that the user enters for the new contact. So we cannot do an upfront check for a duplicate contact. Also, it is likely that more config is necessary to allow the contact dupe checking to be really useful. (Maybe we can find some sensible pre-sets, but is seems like lots of tuning may be needed for some cases...) Because of this, I am skeptical of a "one-size-fits-all" solution for dupe-doc checking that covers both reports and contacts. (And even if we do decide to go that route, we would not need to support dupe-checking reports in this MVP PR.) It seems like the most important thing to decide at this point is if we think report dupe checking will need to allow for the same level of flexible configuration as contacts (e.g. specifying which fields should be dupe checked). If so, then yeah it probably makes sense to at least design the contact dupe-checking to be extended later for also checking reports. If not, then I think we probably just leave the report dupe checking to its own issue and not worry about it here.
Okay, this got me thinking that maybe I have been coming at this from the wrong direction! What if, instead of configuring the dupe-checking in the app-settings, we did it in the actual form The main downside I see to configuring things in the form is that for contact forms it would be important that only the fields that map directly to the contact doc are eligible for dupe checking. Fields in the (Tell me if I am wrong here, though! 😅 I feel like I am seeing Enekto widget + custom xlsx column as the solution to all problems lately....)
@eljhkrr just putting this on your radar! Please jump in to the discussion here if you have any specific concerns, requirements, or ideas! |
Is your feature request related to a problem? Please describe.
Currently, there appears to be no built-in deterrent against creating records with names similar to existing siblings.
Describe the solution you'd like
Prevent duplicate place/person creation and display possible duplicates for consideration. On record submission (through create or edit flow), we want to show the possible duplicate items to our user. They can then navigate to the possible duplicate item via a link, and proceed with record changes there, or circumvent the duplicate check & proceed with record submission.
Describe alternatives you've considered
Despite improving our search functionality, and training the CHWs to use it, usage of the search feature before creating new records remains low, leading to frequent duplicates.
Additional context
We noticed that our CHWs either forget they've previously captured items or miss previously captured items due to it being slightly mistyped. This has resulted in quite a few duplicate records being created on all user created levels of our hierarchy. We want to fix this at the source before tasks are rolled out to make sure no unnecessary/incorrect tasks fill up our CHWs worklist. This will naturally also improve the accuracy of our data for reporting purposes.
We have a working prototype that we will soon upstream, which can produce the following:
Please see the following related discussions for more info:
https://forum.communityhealthtoolkit.org/t/mitigate-duplicate-data-capture/3313
#6363
As a "damage control" step, as discussed with the medic team, we plan to use Databricks (our tool primarily responsible for pulling couchDB data into our community database0 to also push "flags" to potential duplicate items. That will, in turn, cause tasks to trigger on the app. CHWs are expected to then confirm/deny possible duplicate and determine what should happen to the record (delete, merge, other).
The text was updated successfully, but these errors were encountered: