You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm going to implement this on my custom subclass of MVTManager but I thought it was worth writing up the plan (and my eventual findings) here in case it's worth implementing upstream.
Problem:
I have a table with a few million rows and it's both slow and fairly useless to return a full result set when zoomed out. You probably only need to return a random subset of the results when zoomed out as most items will end up taking up less than a single pixel.
The slowness comes from 2 sources:
the cost of actually querying that many rows
the amount of data you have to send down the wire
Solving (2) would be easy if it wasn't for (1). You still pay a performance penalty for any normal approach to filtering the results.
However Postgres supports a very fast way to return a subset of rows and this is done before any WHERE clause is evaluated:
SELECT * FROM tablename TABLESAMPLE SYSTEM(0.1) WHERE ...;
I'm going to experiment with dynamically calculating the sample % based on zoom level. I have a hunch there's a simple linear formula based on the total rows in the result set and the zoom level that will return a visually similar set of tiles much quicker. It's just a case of tweaking the slope of that formula.
One more point worth noting. Because COUNT itself is slow on large tables there's a trick that gets a fast approximate COUNT:
SELECT reltuples AS ct FROM pg_class WHERE oid = 'tablename'::regclass;
The text was updated successfully, but these errors were encountered:
andybak
changed the title
Support for TABLESAMPLE
Possible optimization for very large tables
Jul 13, 2020
This is probably a good place to mention another optimization I'm using in my custom MVTManager:
def _get_non_geom_columns(self):
columns = []
for field in self.model._meta.get_fields():
if hasattr(field, "get_attname_column"):
column_name = field.get_attname_column()[1]
if column_name != self.geo_col and column_name in self.include_columns:
columns.append(column_name)
return columns
self.include_columns is set to a list of the columns I want in addition the the geom column. For tables with large numbers of columns this can reduce the amount of data that needs to be transferred considerably.
I'm going to implement this on my custom subclass of MVTManager but I thought it was worth writing up the plan (and my eventual findings) here in case it's worth implementing upstream.
Problem:
I have a table with a few million rows and it's both slow and fairly useless to return a full result set when zoomed out. You probably only need to return a random subset of the results when zoomed out as most items will end up taking up less than a single pixel.
The slowness comes from 2 sources:
Solving (2) would be easy if it wasn't for (1). You still pay a performance penalty for any normal approach to filtering the results.
However Postgres supports a very fast way to return a subset of rows and this is done before any WHERE clause is evaluated:
I'm going to experiment with dynamically calculating the sample % based on zoom level. I have a hunch there's a simple linear formula based on the total rows in the result set and the zoom level that will return a visually similar set of tiles much quicker. It's just a case of tweaking the slope of that formula.
One more point worth noting. Because COUNT itself is slow on large tables there's a trick that gets a fast approximate COUNT:
The text was updated successfully, but these errors were encountered: