Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to osm2pgsql expiry and osm2pgsql-replication #987

Closed
pnorman opened this issue Nov 3, 2023 · 5 comments
Closed

Switch to osm2pgsql expiry and osm2pgsql-replication #987

pnorman opened this issue Nov 3, 2023 · 5 comments
Labels
service:tiles The raster map on tile.openstreetmap.org

Comments

@pnorman
Copy link
Collaborator

pnorman commented Nov 3, 2023

One of the common rendering complaints is due to not dirtying tiles when relations change. Switching to osm2pgsql expiry will fix this, as well as allow us to use osm2pgsql-replication, significantly simplifying tile replication scripts. I was setting up a client machine and I was astonished how simple it is these days.

osm2pgsql expiry for the pgsql backend runs in hybrid mode which expires all tiles in a multipolygon below full_area_limit and only the boundary above. The default value for full_area_limit is 20000 but this can be changed with --expire-bbox-size. This means if someone makes a tag edit to the US boundary all the tiles along the edge will be expired, but not the whole of the US.

osm2pgsql comes with osm2pgsql-replication, which stores all state in the database.

The setup command is just osm2pgsql-replication init -d gis and it will determine the date from the data in the database. On a new import it would be osm2pgsql-replication init -d gis --osm-file planet-latest.osm.pbf.

We need a script to take the tile list and touch the relevant files to indicate they need re-rendering. Switch2OSM documents this. Adapting their example slightly, the following would be /usr/local/bin/expire-tiles

#!/bin/sh
set -e
render_expired --map=default --touch-from=13 --min-zoom=13 --max-zoom=19 -s /var/run/renderd/renderd.sock < /var/lib/replicate/dirty_tiles.txt
rm /var/lib/replicate/dirty_tiles.txt

The key is --touch-from. When expiring tiles above that zoom, it touches them to indicate they are stale.
We should consider adjusting these later.

Ignoring error handling and logging, the command that needs to be run is osm2pgsql-replication update -d gis --post-processing /usr/local/bin/expire-tiles -- --log-progress=false --number-processes=1 --expire-tiles=13-19 --expire-output=/var/lib/renderd/dirty_tiles.txt. Settings like hstore, multi-geometry, flat nodes, and style are all stored in the DB by osm2pgsql and are not required when running with --append.

This will download up to 500MB (or otherwise if --max-diff-size is set), apply it to the DB, store the tiles list, and run post-processing.

To run this on a regular schedule, Switch2OSM recommends a cron job, but osm2pgsql recommends a systemd service. systemd services are better for this

For that we'd create /etc/systemd/system/osm2pgsql-update.service

[Unit]
Description=Keep osm2pgsql database up-to-date

[Service]
WorkingDirectory=/tmp
ExecStart=osm2pgsql-replication update -d gis --post-processing /usr/local/bin/expire-tiles -- --log-progress=false --number-processes=1 --expire-tiles=13-19 --expire-output=/var/lib/replicate/dirty_tiles.txt
StandardOutput=append:/var/log/osm2pgsql-updates.log
User=_renderd
Type=simple
Restart=on-failure
RestartSec=5min

And a timer in /etc/systemd/system/osm2pgsql-update.timer

[Unit]
Description=Trigger a osm2pgsql database update

[Timer]
OnBootSec=10
OnUnitActiveSec=30s

[Install]
WantedBy=timers.target

Note for anyone doing this themselves, they'd also have to enable the timer and start it the first time - see osm2pgsql docs.

This will eliminate replicate.erb, expire-tiles.erb, expire.rb, and expire-tiles-single

Possible issues and notes

  • osm2pgsql runs out of memory building the expire list. Because osm2pgsql deduplicates the list, it has to store it in memory. This is not an issue for a 500MB update and a machine with 256GB ram.
  • our replication monitoring scripts for prometheus will need to establish a DB connection every time they run. I don't see an issue with 15 seconds apart.
  • We should make the osm2pgsql timer depend on the postgres service with systemd. I'm not sure off-hand how to do this.
  • osm2pgsql dirties more tiles than our existing algorithm. This is good, with users noticing relation-only updates don't happen right now. If the dirty tiles prove to be too many, we can decrease --expire-bbox-size
  • we should revisit our low-zoom vs on-demand and maybe try --touch-from=13 --min-zoom=12, but let's get the basics going first
  • To set this up we need to disable replication, make the changes, init osm2pgsql-replication, and then start the replication service.
  • We could probably make a systemd oneshot timer that would download the planet, import the planet, set up replication, touch the import complete file with the planet date, and start the replication service. This isn't necessary to start but would make new server setup easier.
@pnorman pnorman added the service:tiles The raster map on tile.openstreetmap.org label Nov 3, 2023
@tomhughes
Copy link
Member

I think we also need to include --multi-geometry, --hstore and --tag-transform-script=/srv/tile.openstreetmap.org/styles/default/openstreetmap-carto.lua in the osm2pgsql options as none of those seem to be preserved in the properties table in the database?

@lonvia
Copy link

lonvia commented Nov 17, 2023

That's correct. All arguments that are only for the pgsql output still need to be given manually to osm2pgsql-replication.

@tomhughes
Copy link
Member

This has now been deployed on balerion and comparison with bowser shows that it has increased the amount of rendering which is expected now we're dirtying more tiles.

@pnorman
Copy link
Collaborator Author

pnorman commented Nov 24, 2023

balerion's CPU usage is about 2x that of bowser, with more high peaks when some relations get touched. Not a huge impact on disk utilization, and IO pressure is approximately double, it's also <2% in both cases so the disks are nowhere maxed out.

I think all the servers have sufficient capacity to take the increased load.

@tomhughes
Copy link
Member

This has now been rolled out across all eight servers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
service:tiles The raster map on tile.openstreetmap.org
Projects
None yet
Development

No branches or pull requests

3 participants