Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicates while iterating all locations in the API #78

Open
mrthomassen opened this issue Nov 3, 2017 · 3 comments
Open

Duplicates while iterating all locations in the API #78

mrthomassen opened this issue Nov 3, 2017 · 3 comments

Comments

@mrthomassen
Copy link

Hi from NRK.

We use your API regularly, but we have found duplicate locations when using a smaller limit. Earlier we used 100 as the limit/page-size, like this: http://api.npolar.no/placename/?q=&filter-status=official&limit=100&start=0&sort=location+asc,ident+asc&format=json

We have tried to start with this base, following your own next-links, but we end up with lots of duplicate locations. If we change to batch-size 5000, there will be no duplicates.
http://api.npolar.no/placename/?q=&filter-status=official&limit=5000&start=0&sort=location+asc,ident+asc&format=json

Best regards,
Andreas Thomassen
NRK

@cnrdh
Copy link
Member

cnrdh commented Nov 10, 2017

(sorry for the delayed response)
The sort parameter is wrong, try removing "+asc" and just use the field-name to sort A-Z...
and also use "area" instead of "location"

@cnrdh
Copy link
Member

cnrdh commented Nov 10, 2017

A little more explanation, since you sorted on a non-existing field ("location asc") there was no proper sorting, this could lead to duplicate entries.

Try using https://api.npolar.no/placename/?q=&filter-status=official&limit=500&start=0&format=json&fields=name,area,id,latitude,longitude,updated&sort=-updated

Comments:

  • 500 is a reasonable batch-size
  • I also showed how you can set the fields you want in case you don't use everything
  • I sorted on the last updated, you can also sort by area

@mrthomassen
Copy link
Author

Thank for your answer.

We ended up using id for sorting-parameter, that was a safe path to avoid duplictes.

You should look into the links in the API. When switching to https, next-links and so on still use http.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants