-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get changes from a specific date only for pages with geo location #70
Comments
For updated pages, you can try Then you can use However, I'm not sure what will happen if a page has been updated multiple times during your specified time range, when using |
Thanks, I'm not worried about multiple updates since I only need the pages' Ids. from there I send a request to get a page with all the fields I need in order to mirror it. |
Ok, So basically I need to choose which one to run first, i.e:
Both options require two steps as far as I understand in terms of getting the data and then filtering it. |
This surely would be slow, as there are a lot of changes taking place on WP every minute (I didn't check the actual number).
I think this approach is better, too. You just need to keep track of the coordinates of the pages since your last visit, so you can discover whether there are pages that have been move out of your BBox. |
True, I'll need to track deleted pages and pages that moved out of the BBox. |
The intention of It's up to you to decide whether to leverage this method. Sometimes it may be worthwhile to use |
If you used Generally, you can pass in a |
The plot thickens :-)
When using almost the same code but with See here: var geoSearchGenerator = new GeoSearchGenerator(_wikiSites[language])
{
BoundingRectangle = GeoCoordinateRectangle.FromBoundingCoordinates(southWest.X, southWest.Y, northEast.X, northEast.Y),
PaginationSize = 500,
};
var results = await geoSearchGenerator//.EnumItemsAsync().ToListAsync();
.EnumPagesAsync(new WikiPageQueryProvider
{
Properties =
{
new ExtractsPropertyProvider {AsPlainText = true, IntroductionOnly = true, MaxSentences = 1},
new PageImagesPropertyProvider {QueryOriginalImage = true},
new GeoCoordinatesPropertyProvider {QueryPrimaryCoordinate = true},
new RevisionsPropertyProvider { FetchContent = false }
}
}).ToListAsync(); |
OK, I went a step further and checked the requests using fiddler. |
Seems like coordinates is added only to 10 pages out of the query - when the query page is 10 (the default) it works as expected, which is how the sandbox API shows the results, but when setting it to 500 it doesn't :-( |
{
"continue": {
"excontinue": 20,
"picontinue": 128475,
"cocontinue": "8670|13334822",
"continue": "||revisions"
},
"query": {
"pages": {
"1225": { As I've mentioned in #69, there is some basic logic in |
Yea, I figured it out yesterday after digging into your code and seeing in fiddler that the number of pages that are scrolled is about 5 when doing a Thanks again for all the explanations and great library! |
My use cause:
I'm using geosearch to get all the points in a certain area.
For each point I get the extended data I need and store it in the database (A mirroring of some sort only with the data I truly need - pages with geo location).
Later on I would like to know what items were updated or added from a specific point in time.
I'm not sure if there's an easy API to know what was added and what was updated given a specific date and then I'll need to test which page has geo location or to get the revisions list of a geoserach results.
In any case, I need to do a database incremental update given a specific date.
Any advise would be welcome :-)
I haven't found an option to add more props to geosearch generator.
Here's an example to a query:
/w/api.php?action=query&format=json&prop=coordinates%7Cpageimages%7Crevisions&generator=geosearch&ggscoord=37.7891838%7C-122.4033522
This page too:
https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=coordinates%7Cpageimages%7Crevisions&generator=geosearch&ggscoord=37.7891838%7C-122.4033522
I'm not sure this is the right solution though...
The text was updated successfully, but these errors were encountered: