add withOptional flag that will trigger region when needed #42

Joxit · 2021-02-05T10:27:32Z

I added a commit above #41 that implement my comment #41 (comment)

What do you think @missinglink ? The PR for the API is in progress 😉

missinglink · 2021-02-10T22:38:36Z

Heya, sorry I didn't reply earlier, I wanted to think about this a bit more before commenting.

I think that generally this is a great approach, namely:

have multiple label generators per country
let the API layer decide if it should re-generate the labels using a different generator

One change I would suggest is around code organisation.
The withOptional and meta.optional syntax don't feel very natural to me.

Just throwing out an idea, but could we change that so that each 'generator' (such as fr for instance) could have multiple sub-generators, so fr.verbose or fr.succinct for example.

I think that has a few advantages:

Ability to add more than one 'optional' generator
Cleaner syntax / module API
Easier to understand the logic and reason about the outcome of each generator
Easier to test each generator independently of each other
Potentially configurable which sub-generator is selected via pelias.json
Extensible in the future if we decided that in some cases we needed to re-label more than once

Not sure, does that make sense?

Joxit · 2021-02-11T23:22:26Z

If I understand correctly, the schema should look something like this ?

  'FRA': {
    'valueFunctions': {
      'local': getFirstProperty(['locality', 'localadmin']),
      'country': getFRACountryValue()
    },
    'verbose': {
      'valueFunctions': {
        'local': getFirstProperty(['locality', 'localadmin']),
        'regional': getFRARegionValue(),
        'country': getFRACountryValue()
      },
      'meta': {
        'separator': ',' // This is useless for FRA
      }
    },
    'meta': {
      'separator': ',' // This is useless for FRA
    }
  }

I like the term verbose, but I don't think it will be enough for what I have in mind.

I want to expand the best part of the label based on the results. That means:

For Bagneux, France, I will expand the regional part

Bagneux, Hauts-de-Seine, France
Bagneux, Deux-Sèvres, France
...

For Starbucks, New York, NY, USA, I will expand the street part

Starbucks, 5th Avenue, New York, NY, USA
Starbucks, 6th Avenue, New York, NY, USA
...

For Police nationale, Marseille, France, I will expand the borough (maybe with street & only in France)

Police nationale, 12e Arr., Marseille, France
Police nationale, 16e Arr., Marseille, France
...

With an option like withOptional and verbose, the results for Starbucks an Police nationale will include the regional part which is not relevant because it is not discriminatory.

Maybe I will need metadata from the API to identify similar parts... Or pelias-label should take the result list...

⚠️ MIDNIGHT DRAFT

Metadata example for Bagneux, France:

{
        "country": 2, // same name and same ID
        "macroregion": 0, // no match
        "region": 0, // no match
        "macrocounty": 0, // no match
        "county": 0, // no match
        "localadmin": 1, // same name
        "locality": 1, // same name
        "continent": 2, //same name and same ID
}

Metadata example for Starbucks, New York, NY, USA:

{
        "housenumber": 0, // no match
        "street": 0, // no match
        "postalcode": 0, // no match
        "country": 2,  //same name and same ID
        "region": 2,  //same name and same ID
        "county": 2,  //same name and same ID
        "locality": 2,  //same name and same ID
        "borough": 2,  //same name and same ID
        "neighbourhood": 0, // no match
        "continent": 2,  //same name and same ID
}

missinglink · 2021-02-12T04:46:26Z

hmm yeah sorry if I'm actually making this more complicated 😆
I just look at this code (even prior to this PR) and think there must be a better way to do it 🤔

Just throwing out an idea here but if we returned an array of objects from the generator, we could then pass it up to the API layer and let it make the decision about how to .join() it.

So off the top of my head, something like this:

var doc = thePeliasDocForBagneux()
var parts = generator.fr(doc)

/**
  parts would be equal to:
  
  [
    { label: 'Bagneux', layer: 'locality', role: 'required' },
    { label: 'Hauts-de-Seine', layer: 'region', role: 'optional' },
    { label: 'France', layer: 'country', role: 'required' },
  ]
  
**/

// Get the existing 'simple' label
var simple = (p) => ['required'].includes(p.role)
var str = parts.filter(simple).join(', ')

// Get the label with 'optional' parts
var extended = (p) => ['required','optional'].includes(p.role)
var str = parts.filter(extended).join(', ')

The generator would just be responsible for specifying the order of the array items, what is included or discarded from the array and what roles are assigned based on the document layer.

The stringification could be performed either within this repo or in the pelias/api codebase, whichever makes the most sense.

🤷‍♂️

Joxit · 2021-02-12T06:37:29Z

Hum... Yes and in this case, if we return an array to the API:

We do the simple join
We check duplicates
On duplicates, we looking for the left most discriminatory optional label on the first two documents
We recreate the label with required ones and the discriminatory label

We should not forget to return the separator character (space in Korea)

This is of course a breaking change

Joxit · 2021-02-15T21:30:42Z

Hi there, I updated this branch with pelias/api#1507 to apply your idea @missinglink.

This is backward compatible, I added a new function partsGenerator that will return the new object:

{
  parts:   [
    { label: 'Bagneux', layer: 'locality', role: 'required' },
    { label: 'Hauts-de-Seine', layer: 'region', role: 'optional' },
    { label: 'France', layer: 'country', role: 'required' },
  ],
  separator: ', '
}

role has only two values, I don't know if one day we will use it with other values there or if we could simplify with a boolean required: true 🤷

Anyway, the result looks very good !

➡️ Police nationale, Marseille, France

Police nationale, 12th Arr., Marseille, France
Police nationale, 16th Arr., Marseille, France
Police nationale, 2nd Arr., Marseille, France
Police nationale, 3rd Arr., Marseille, France
Police nationale, 14th Arr., Marseille, France
Police nationale, 13th Arr., Marseille, France
Police nationale, 8th Arr., Marseille, France
Police nationale, 11th Arr., Marseille, France
Police nationale, 1st Arr., Marseille, France
Police nationale, 10th Arr., Marseille, France

➡️ Bagneux, France

Bagneux, Hauts-de-Seine, France
Bagneux, Deux-Sèvres, France
Bagneux, Indre, France
Bagneux, Marne, France
Bagneux, Allier, France
Bagneux, Meurthe-et-Moselle, France
Bagneux, Aisne, France

➡️ Starbucks, New York

Starbucks, 5th Avenue, New York, NY, USA
Starbucks, 6th Avenue, New York, NY, USA
Starbucks, 3rd Avenue, New York, NY, USA
Starbucks, West 181st Street, New York, NY, USA
Starbucks, 7th Avenue, New York, NY, USA
Starbucks, 1st Avenue, New York, NY, USA
Starbucks, East 93rd Street, New York, NY, USA
Starbucks, East 90th Street, New York, NY, USA
Starbucks, West 145th Street, New York, NY, USA
Starbucks, West 23rd Street, New York, NY, USA

Joxit · 2022-05-16T12:36:13Z

Hello there, I did an update of this PR

I added some tests (173 passed => 310 passed)
Some layer values were missing
I've updated the output of partsGenerator now it's { labelParts: [], separator: '' } instead of { parts: [], separator: '' }
It's now sync with master

Now I have two questions.
Do you think this PR is still interesting for the project? Otherwise I will stop updating it and close it.

I saw that there were spaces in Japanese labels. Maybe this PR is a perfect use case to handle this?

labels/builders/JPN-JPN.js

Lines 73 to 75 in 73ef845

    
           // combine components, inserting a space between the postalcode, admin, and name sections 
        
           // when written on an envelope these would correspond to new lines, a space should help with some clarity 
        
           let labelParts = _.concat(postalcode, ' ', admin, distric, block, ' ', venue);

let labelParts = _.concat(postalcode, { label: ' ', role: 'newline' }, admin, distric, block, { label: ' ', role: 'newline' }, venue); 
// OR
let labelParts = _.concat(postalcode, { label: ' ', role: 'separator' }, admin, distric, block, { label: ' ', role: 'separator' }, venue);

Also, the space seems to be ignored when the label is built:

labels/test/labelGenerator_JPN_JPN.js

Lines 137 to 142 in 73ef845

    
           const expected = '' + // no postalcode for this record 
        
             '北海' + // prefecture (region), Hokkaido Prefecture 
        
             '札幌' + // city (county), Sapporo 
        
             '札幌市' + // designated city (locality), Sapporo-shi 
        
             '北二十四条西四丁目' + // chome/ward (street) 
        
             '2番12号'; // "bango" (block and building number)

This will help for label deduplication in the API. Sample of the response ```json { labelParts: [ { label: 'Bagneux', layer: 'name', role: 'required' }, { label: 'Hauts-de-Seine', layer: 'region', role: 'optional' }, { label: 'France', layer: 'country', role: 'required' }, ], separator: ', ' } ```

Joxit · 2022-05-19T19:10:51Z

I added some minor fixes, now this is ready to review/merge

orangejulius · 2022-05-19T19:28:34Z

We really need to test this out, you've done a lot of work here. Stand by :)

Joxit added the enhancement label Feb 5, 2021

Joxit mentioned this pull request Feb 5, 2021

assignLabels: improve duplicate label handling pelias/api#1507

Draft

Joxit force-pushed the joxit/feat/with-optional branch from 05e1a9c to bbf316a Compare February 5, 2021 21:36

Joxit force-pushed the joxit/feat/with-optional branch from bbf316a to 5374b64 Compare February 15, 2021 17:07

Joxit force-pushed the joxit/feat/with-optional branch from 5374b64 to 5936adf Compare October 8, 2021 09:23

Joxit force-pushed the joxit/feat/with-optional branch 2 times, most recently from 12ab8d0 to 73ef845 Compare May 16, 2022 12:11

Joxit requested review from missinglink and orangejulius May 16, 2022 12:36

Joxit force-pushed the joxit/feat/with-optional branch 3 times, most recently from f5b56ed to 37056b8 Compare May 17, 2022 10:13

Joxit force-pushed the joxit/feat/with-optional branch from 37056b8 to 762f22b Compare May 17, 2022 11:11

Joxit marked this pull request as draft January 17, 2023 13:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add withOptional flag that will trigger region when needed #42

add withOptional flag that will trigger region when needed #42

Joxit commented Feb 5, 2021

missinglink commented Feb 10, 2021 •

edited

Loading

Joxit commented Feb 11, 2021

missinglink commented Feb 12, 2021 •

edited

Loading

Joxit commented Feb 12, 2021

Joxit commented Feb 15, 2021

Joxit commented May 16, 2022 •

edited

Loading

Joxit commented May 19, 2022

orangejulius commented May 19, 2022

add withOptional flag that will trigger region when needed #42

Are you sure you want to change the base?

add withOptional flag that will trigger region when needed #42

Conversation

Joxit commented Feb 5, 2021

missinglink commented Feb 10, 2021 • edited Loading

Joxit commented Feb 11, 2021

missinglink commented Feb 12, 2021 • edited Loading

Joxit commented Feb 12, 2021

Joxit commented Feb 15, 2021

Joxit commented May 16, 2022 • edited Loading

Joxit commented May 19, 2022

orangejulius commented May 19, 2022

missinglink commented Feb 10, 2021 •

edited

Loading

missinglink commented Feb 12, 2021 •

edited

Loading

Joxit commented May 16, 2022 •

edited

Loading