[KYC Match] Scoring #85

ToshiWakayama-KDDI · 2024-05-22T07:28:56Z

Problem description

To consider Scoring feature for KYC Match.
(Spin off from Issue #65, item No.1, as per Action Item #13.03)

KevScarr · 2024-05-22T10:01:20Z

Hi @ToshiWakayama-KDDI
Linking out to a thread / good discussion around the concepts for 'score': [#46] .

I would summarise and propose the below, where 'attribute' below is a field in the existing KYC specification:-

When a response is "attributeMatch: 'false'" we include an extra response field "attributeScore: 70".
Example:
- when "familyNameAtBirthMatch: 'false'" is returned, a new response field of "familyNameAtBirthScore: 70" is included
Rules:
- Numeric attributes are not checked: ie birthdate (distance scores wouldn't make sense)
- The response "attributeMatch" must be 'false'
- The Score value is a whole number (%): 0 to 100 (0 = no match, 100 = exact match)
- For consistency: Recommend using Jaro-Winkler distance algorithm as per other operators that are live today (after normalisation has been applied).

HuubAppelboom · 2024-05-22T11:29:44Z

Hi @KevScarr
Why not provide the Score value as well when the "attributeMatch" is "true", but when there is a small difference (probably a spelling mistake on either side) ?? Or do you propose to provide only a "true" answer when the Score is 100% ?

KevScarr · 2024-05-22T11:59:12Z

@HuubAppelboom I would suggest a true equates to an exact match, ie =100. for close matches ie when you return a score allow the consuming service to judge if it's a close enough match or not to proceed (their use-cases will drive their error tolerance).

GillesInnov35 · 2024-05-22T13:14:13Z

hi @HuubAppelboom , @KevScarr, I understand that a score result (optional) might be added to a boolean attribute (True/False/ Not-avalaible) which is mandatory if provided in the request.
Inthis case, I wonder if the boolean attribute is useful.
At Orange the response contains only a score match result. Consumer has to decide.
Gilles

KevScarr · 2024-05-22T15:11:47Z

@GillesInnov35 @HuubAppelboom Fair point; purely thinking about when a customer of the service migrates from the previous version to this version so backward compatibility would be important. I'd say the score is only provided when a boolean: false is returned; outside of that condition it offers little value.
For Orange: Do you still respond with a not-available indicator? and can you share which algorithm you're using (JW?)

GillesInnov35 · 2024-05-22T15:22:12Z

yes sure Kevin, backward compatibility will be an important point, but as KYC Match version 1.0.0 has not been published I wonder if it is a problem. But may be it is.
to answer to your question:

The matching algorithm implemented by the french MNOs is based on the Jaro–Winkler distance
The score is a value between 0 and 100, the higher the score, the more similar the strings, the value 100 means an exact match and the value 0 means there is no similarity.
The score « -1 » is a special value, it indicates that the requested value was not found by the MNO.

Thanks a lot for your active contribution
Regards
Gilles

KevScarr · 2024-05-22T15:32:37Z

Makes sense. So you would return a '-1' when the attribute wasn't available for checking, hence no requirement to have the boolean field in your current response.

If no MNO has implemented the current version then it's a fair shout to move towards a score only approach.

HuubAppelboom · 2024-05-23T06:12:18Z

@KevScarr @GillesInnov35 We may need to think of an approach which makes it possible to be extended further. For example, I think it may be a good idea to provide feedback whether the data is unverfied or has been verified by the MNO. That way we can provide a larger market reach, by also including unverified attributes, and the CSP can then decide whether to use that attribute or not.

ToshiWakayama-KDDI · 2024-05-23T07:14:24Z

Hi @GillesInnov35 , @HuubAppelboom , @KevScarr , all,

Thank you for your prompt comments/discussion, which I did not expect actually.

I should have informed you that there is KYC Match scoring enhancement proposal in the API Backlog WG, so, once we have received the proposal, we should proceed with our scoring discussion taking it into account. We should wait for it, but I don't think it will take long.

I will update the status.

Best regards,
Toshi

ToshiWakayama-KDDI · 2024-05-23T07:16:50Z

Hi @GillesInnov35 , @HuubAppelboom , @KevScarr, all,

Our implementation is based on v0.1.0, and actully we do not need scoring feature, so, we would insist KYC Match API should work without scoring. It is the OGW original scope, I understand, and for a OGW global API, it is also important. In addition, as we all know, we have put our efforts into v0.1.0 already, so we should use our initial design and consider backward compatibility as much as possible, I believe.

Thanks,
Toshi

HuubAppelboom · 2024-05-23T12:19:36Z

As a suggestion how to add score and other information to the API response, maintain backwards compatibility, and have something that can be expanded, we could add an extra string (when applicable) in the response for attributes where score is relevant.

For example the attributeMatch will have values "true", "false", "not_available" (like today)
And we add an extra answer "attributeMatchInfo" that contain items like "score=89 unverified" to signal that the Jaro-Winkler score is 89, but that the source data has not been verified by the MNO. And when we have additional metadata, this can be added in future.

So for example you will get:

givenNameMatch : false
givenNameMatchInfo : score=95 verified

GillesInnov35 · 2024-05-24T09:12:07Z

hi @ToshiWakayama-KDDI, all, thanks for your comment.
I had a look at the API Backlog issue/PR opened by @jgarciahospital on API Enhancement Proposal KYC-Match Scoring. It is in line with our current discussion on how adding a match score level information, and so it is interesting.
I'm afraid it'll be difficult to propose a backward compatibility if we've to replace a simple attribute by a object structure after version 0.1.0.
This is just my point of view to be discussed.
For example:

BR
Gilles

claraserranosolsona · 2024-06-06T15:09:04Z

Hi all,

As advanced in last week meeting:

Telefonica has implemented v0.1.0, therefore we would need backwards compatibility in v0.2.0
This would be in line with the proposal of maintaining current true/false/not_available response and in the case of false, adding a score. For example:

• Keep current attributes-> "attributeMatch": true/false/not_available
• If false, add additional parameters -> "attributeScore": X%

From the technical perspective, this should keep backwards compatibility as, based on OAS3, there is a parameter called “additionalProperties” which indicates if the object (our answer in this case) can have additional parameters not documented or not. The default value of “additionalProperties” is true, therefore in CAMARA we assume it is true. So the customer should be ready to receive additional parameters. It would be worth it to check this.

However, the proposal of changing a simple attribute to an object structure would not be an option for backwards compatibility, therefore not possible for us
Ok to proceed with the following rules proposed for the score:

• Numeric attributes are not checked: ie birthdate
• The response "attributeMatch" must be 'false'
• The Score value is a whole number (%): 0 to 100 (0 = no match, 100 = exact match)
• Using Jaro-Winkler distance algorithm (after normalisation has been applied).

Regards,
Clara

GillesInnov35 · 2024-06-07T12:08:52Z

hi all, thanks Clara for this detailed summary.
If we must address backward compatibility because of v0.1.0 already deployed, I agree with you that we should add new optional score attributes.
Do you think we've time to imagine a design based on OAS3 specifications in order to avoid a long list of attributes ?
BR
Gilles

KevScarr · 2024-06-12T13:39:44Z

Building on Issue #96 / we should follow the same design convention (define once, use many):-

ScoreMatchResult:
    type: integer
    description: Attribute comparison score as a percentage for string comparisons
    example: 85
    minimum: 0
    maximum: 100	
    
KYC_MatchResponse:
    type: object
    properties:
 
    idDocumentMatch:
        $ref: '#/components/schemas/MatchResult'
 
    nameMatch:
        $ref: '#/components/schemas/MatchResult'
        $ref: '#/components/schemas/ScoreMatchResult'
 
    givenNameMatch:
        $ref: '#/components/schemas/MatchResult'
        $ref: '#/components/schemas/ScoreMatchResult'

ScoreMatchResult to appear for all attribute fields, excluding the following fields as they are numeric/enum/ID based:-

idDocumentMatch
streetNumberMatch
birthdayMatch
genderMatch

When a field is numeric only in a particular country, as per the above summary, the score wouldn't be returned.

KevScarr · 2024-06-17T16:35:30Z

I've taken the attributes from the current version of the specification and following the rules given an initial view of which attributes can support a 'score' concept in full. It would be good to reach a common view across as many countries as possible, it'll then make updating the yaml spec straightforward.

Attribute	Optional Score Available	Comment
idDocumentMatch	No	It’s an ID number.
nameMatch	YES
givenNameMatch	YES
familyNameMatch	YES
nameKanaHankakuMatch	???	Are these fields in next release?
nameKanaZenkakuMatch	???	Are these fields in next release?
middleNamesMatch	YES
familyNameAtBirthMatch	YES
addressMatch	YES
streetNameMatch	YES
streetNumberMatch	YES	Is this houseName in some countries / assumption yes
postalCodeMatch	No	Being out by one letter can be a different place.
regionMatch	YES
localityMatch	YES
countryMatch	YES
houseNumberExtensionMatch	No	It’s numeric, not relevant.
birthdateMatch	No	It’s numeric, not relevant.
emailMatch	YES
genderMatch	No	It’s an enum type.

Some fields in some countries will be all numeric in others, a mixture.
The table above captures which match attributes in the “KYC_MatchResponse” can support a ScoreMatch.

@ToshiWakayama-KDDI Should the nameKana*Match attributes also have scores in this next version of the specification (ie will these attributes remain here or be in an extension)?

fernandopradocabrillo · 2024-06-18T07:59:19Z

Building on Issue #96 / we should follow the same design convention (define once, use many):-

ScoreMatchResult:
    type: integer
    description: Attribute comparison score as a percentage for string comparisons
    example: 85
    minimum: 0
    maximum: 100	
    
KYC_MatchResponse:
    type: object
    properties:
 
    idDocumentMatch:
        $ref: '#/components/schemas/MatchResult'
 
    nameMatch:
        $ref: '#/components/schemas/MatchResult'
        $ref: '#/components/schemas/ScoreMatchResult'
 
    givenNameMatch:
        $ref: '#/components/schemas/MatchResult'
        $ref: '#/components/schemas/ScoreMatchResult'

Hi @KevScarr
I agree with the porposal of creating a common schema for the response objects, but I don't fully understand what is the final result here. As far as I know in OAS3 we cannot use two $ref objects at the same level.

From TEF our proposal is mainly focused in not losing the retrocompatibility as we are already integrated with clients so the design could be simpler:

     idDocumentMatch:
         $ref: '#/components/schemas/MatchResult'
     idDocumentScoreMatch:
         $ref: '#/components/schemas/ScoreMatchResult'

We can document that the ScoreMatch properties will only be returned if the related property is false

GillesInnov35 · 2024-06-18T09:14:25Z

hi @fernandopradocabrillo, I think that with an allOf word it works well.

allOf:
        - $ref: '#/components/schemas/MatchResult'
        - $ref: '#/components/schemas/ScoreMatchResult'

to be confirmed I suppose
BR
Gilles

GillesInnov35 · 2024-06-19T08:39:54Z

hi @fernandopradocabrillo, you're right. My proposition bellow can't be applied.

allOf:
        - $ref: '#/components/schemas/MatchResult'
        - $ref: '#/components/schemas/ScoreMatchResult'

I agree with yours regarding backward compatibility which is expected.
Gilles

ToshiWakayama-KDDI · 2024-06-23T06:18:21Z

Hi @KevScarr , all,

@ToshiWakayama-KDDI Should the nameKana*Match attributes also have scores in this next version of the specification (ie will these attributes remain here or be in an extension)?

Thank you for asking me about this. We would prefer to have scores for the nameKanaHankakuMatch and the nameKanaZenkakuMatch attributes in this next version.

Sorry for the late reply, as I needed to discuss this internally.

BR
Toshi

ToshiWakayama-KDDI · 2024-06-24T07:15:51Z

Hi @KevScarr , @fernandopradocabrillo , @GillesInnov35 , @claraserranosolsona , all

I have a question for my clarification about way of scoring.

It seems that Jaro-Winkler distance algorithm will be used for scoring of string-type attributes (after normalisation has been applied), however, I think it should be up to each operator to choose the way how to calculate scoring.

The reason is, even though in Europe Jaro-Winkler distance algorithm could be used as the common way, it is unclear that Jaro-Winkler distance algorithm can be used for other languages, or, if it can be used for another language, it unclear that Jaro-Winkler distance algorithm is best suited for it. That is my concern, and actually we ourselves are not sure about using Jaro-Winkler distance algorithm for Japanease language.

So, is it OK that it will be up to each operator to choose the way how to calculate scoring, or, is there any other thought?

Thanks,
Toshi
KDDI

GillesInnov35 · 2024-06-24T12:19:41Z

hi @ToshiWakayama-KDDI , all, I don't really know if this algorithm works for all languages but it should (to be confirmed).
I think we should validate an unique algo to have the same specifications and the same rules for all KYC Match API providers and avoid specific implementation.

BR
Gilles

ToshiWakayama-KDDI · 2024-06-25T06:16:14Z

Hi @gilles, Thanks for your comments.

"I think we should validate an unique algo to have the same specifications and the same rules for all KYC Match API providers and avoid specific implementation."

This is agreeable sentence, however, as Jaro-Winkler algorithm has not been proved effective for other languages than European languages, it would not be a better way to specify Jaro-Winkler algorithm as mandatory algorithm. If specific algorithms are needed in KYC Match API spec, for example, Jaro-Winkler could be recommendation for European languages, but algorithm for other languages should be TBD.

Would this be a possible way forward?

BR
Toshi

claraserranosolsona · 2024-07-01T15:01:21Z

Hi @ToshiWakayama-KDDI ,

As discussed in last week meeting, in order to have a standard score as far as possible, would be ok to proceed with Jaro-Winkler algorithm indicating the following?

"Unless otherwise captured in the specification, score will use the JaroWinkler distance algorithm for all countries."

As so far JaroWinkler has been proven to be the most effective algorithm when comparing two strings, but if at some point for a specific language there is another algorithm that works better, this would give the option to change it.

Many thanks,
Clara

ToshiWakayama-KDDI · 2024-07-02T08:23:43Z

Hi @ToshiWakayama-KDDI ,

As discussed in last week meeting, in order to have a standard score as far as possible, would be ok to proceed with Jaro-Winkler algorithm indicating the following?

"Unless otherwise captured in the specification, score will use the JaroWinkler distance algorithm for all countries."

As so far JaroWinkler has been proven to be the most effective algorithm when comparing two strings, but if at some point for a specific language there is another algorithm that works better, this would give the option to change it.

Many thanks, Clara

Hi @claraserranosolsona ,

Thanks for reminding me. Sorry for the delay, due to my sickness (Covid-19 still exists) and so on. I think I can reply by tomorrow.

Thank you for your understanding.

Reagrds,
Toshi

ToshiWakayama-KDDI · 2024-07-03T07:24:10Z

Hi @claraserranosolsona ,

It seems Jaro-Winkler algorithm itself can be used for Japanese lanaugage, however, KDDI does not provide Match Scoring function at all now, so, we are not sure if values caluculated by Jaro-Winkler algorithm are meaningful for KYC Match service, I am afraid.

If you want to use Jaro-Winkler algorithm commonly for Match Scoring, it is fine with us by adding the proposed sentence "Unless otherwise captured in the specification, score will use the JaroWinkler distance algorithm for all countries" in the API description. When KDDI implement Match Scoring function, we could add something in the description if we would have problem with Jaro-Winkler algorithm.

Just to reiterate our thoughts. We understand that in Europe Jaro-Winkler algorithm has been used and has been proven effective, so, there should be no problem, but we think that this algorithm should not be any barrier for operators in other langauge areas to implement this API, and that this API should be made an API suitable for globally common.

Many thanks,
Toshi
KDDI

ToshiWakayama-KDDI added the enhancement New feature or request label May 22, 2024

jgarciahospital mentioned this issue May 23, 2024

Scope change for KYC-Match API - Scoring Logic camaraproject/APIBacklog#45

Closed

KevScarr mentioned this issue Jun 11, 2024

Select API for CAMARA Meta-Release September #75

Closed

fernandopradocabrillo mentioned this issue Jun 20, 2024

Include score functionality proposal #104

Merged

ToshiWakayama-KDDI closed this as completed in #104 Jul 10, 2024

jgarciahospital mentioned this issue Aug 1, 2024

New API Enhancement Proposal KYC-Match Scoring camaraproject/APIBacklog#46

Merged

ToshiWakayama-KDDI mentioned this issue Nov 10, 2024

KYC-Match: Clarification on handling of unsupported attributes? #168

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KYC Match] Scoring #85

[KYC Match] Scoring #85

ToshiWakayama-KDDI commented May 22, 2024 •

edited

Loading

KevScarr commented May 22, 2024 •

edited

Loading

HuubAppelboom commented May 22, 2024

KevScarr commented May 22, 2024

GillesInnov35 commented May 22, 2024

KevScarr commented May 22, 2024

GillesInnov35 commented May 22, 2024

KevScarr commented May 22, 2024

HuubAppelboom commented May 23, 2024

ToshiWakayama-KDDI commented May 23, 2024

ToshiWakayama-KDDI commented May 23, 2024

HuubAppelboom commented May 23, 2024

GillesInnov35 commented May 24, 2024

claraserranosolsona commented Jun 6, 2024

GillesInnov35 commented Jun 7, 2024

KevScarr commented Jun 12, 2024 •

edited

Loading

KevScarr commented Jun 17, 2024

fernandopradocabrillo commented Jun 18, 2024

GillesInnov35 commented Jun 18, 2024

GillesInnov35 commented Jun 19, 2024

ToshiWakayama-KDDI commented Jun 23, 2024

ToshiWakayama-KDDI commented Jun 24, 2024

GillesInnov35 commented Jun 24, 2024

ToshiWakayama-KDDI commented Jun 25, 2024

claraserranosolsona commented Jul 1, 2024

ToshiWakayama-KDDI commented Jul 2, 2024

ToshiWakayama-KDDI commented Jul 3, 2024

[KYC Match] Scoring #85

[KYC Match] Scoring #85

Comments

ToshiWakayama-KDDI commented May 22, 2024 • edited Loading

KevScarr commented May 22, 2024 • edited Loading

HuubAppelboom commented May 22, 2024

KevScarr commented May 22, 2024

GillesInnov35 commented May 22, 2024

KevScarr commented May 22, 2024

GillesInnov35 commented May 22, 2024

KevScarr commented May 22, 2024

HuubAppelboom commented May 23, 2024

ToshiWakayama-KDDI commented May 23, 2024

ToshiWakayama-KDDI commented May 23, 2024

HuubAppelboom commented May 23, 2024

GillesInnov35 commented May 24, 2024

claraserranosolsona commented Jun 6, 2024

GillesInnov35 commented Jun 7, 2024

KevScarr commented Jun 12, 2024 • edited Loading

KevScarr commented Jun 17, 2024

fernandopradocabrillo commented Jun 18, 2024

GillesInnov35 commented Jun 18, 2024

GillesInnov35 commented Jun 19, 2024

ToshiWakayama-KDDI commented Jun 23, 2024

ToshiWakayama-KDDI commented Jun 24, 2024

GillesInnov35 commented Jun 24, 2024

ToshiWakayama-KDDI commented Jun 25, 2024

claraserranosolsona commented Jul 1, 2024

ToshiWakayama-KDDI commented Jul 2, 2024

ToshiWakayama-KDDI commented Jul 3, 2024

ToshiWakayama-KDDI commented May 22, 2024 •

edited

Loading

KevScarr commented May 22, 2024 •

edited

Loading

KevScarr commented Jun 12, 2024 •

edited

Loading