Possible optimization for very large tables #9

andybak · 2020-07-13T10:54:28Z

I'm going to implement this on my custom subclass of MVTManager but I thought it was worth writing up the plan (and my eventual findings) here in case it's worth implementing upstream.

Problem:

I have a table with a few million rows and it's both slow and fairly useless to return a full result set when zoomed out. You probably only need to return a random subset of the results when zoomed out as most items will end up taking up less than a single pixel.

The slowness comes from 2 sources:

the cost of actually querying that many rows
the amount of data you have to send down the wire

Solving (2) would be easy if it wasn't for (1). You still pay a performance penalty for any normal approach to filtering the results.

However Postgres supports a very fast way to return a subset of rows and this is done before any WHERE clause is evaluated:

SELECT * FROM tablename TABLESAMPLE SYSTEM(0.1) WHERE ...;

I'm going to experiment with dynamically calculating the sample % based on zoom level. I have a hunch there's a simple linear formula based on the total rows in the result set and the zoom level that will return a visually similar set of tiles much quicker. It's just a case of tweaking the slope of that formula.

One more point worth noting. Because COUNT itself is slow on large tables there's a trick that gets a fast approximate COUNT:

SELECT reltuples AS ct FROM pg_class WHERE oid = 'tablename'::regclass;

The text was updated successfully, but these errors were encountered:

andybak · 2020-07-13T10:58:35Z

This is probably a good place to mention another optimization I'm using in my custom MVTManager:

def _get_non_geom_columns(self):
    columns = []
    for field in self.model._meta.get_fields():
        if hasattr(field, "get_attname_column"):
            column_name = field.get_attname_column()[1]
            if column_name != self.geo_col and column_name in self.include_columns:
                columns.append(column_name)
    return columns

self.include_columns is set to a list of the columns I want in addition the the geom column. For tables with large numbers of columns this can reduce the amount of data that needs to be transferred considerably.

andybak changed the title ~~Support for TABLESAMPLE~~ Possible optimization for very large tables Jul 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible optimization for very large tables #9

Possible optimization for very large tables #9

andybak commented Jul 13, 2020

andybak commented Jul 13, 2020

Possible optimization for very large tables #9

Possible optimization for very large tables #9

Comments

andybak commented Jul 13, 2020

andybak commented Jul 13, 2020