Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible optimization for very large tables #9

Open
andybak opened this issue Jul 13, 2020 · 1 comment
Open

Possible optimization for very large tables #9

andybak opened this issue Jul 13, 2020 · 1 comment

Comments

@andybak
Copy link

andybak commented Jul 13, 2020

I'm going to implement this on my custom subclass of MVTManager but I thought it was worth writing up the plan (and my eventual findings) here in case it's worth implementing upstream.

Problem:

I have a table with a few million rows and it's both slow and fairly useless to return a full result set when zoomed out. You probably only need to return a random subset of the results when zoomed out as most items will end up taking up less than a single pixel.

The slowness comes from 2 sources:

  1. the cost of actually querying that many rows
  2. the amount of data you have to send down the wire

Solving (2) would be easy if it wasn't for (1). You still pay a performance penalty for any normal approach to filtering the results.

However Postgres supports a very fast way to return a subset of rows and this is done before any WHERE clause is evaluated:

SELECT * FROM tablename TABLESAMPLE SYSTEM(0.1) WHERE ...;

I'm going to experiment with dynamically calculating the sample % based on zoom level. I have a hunch there's a simple linear formula based on the total rows in the result set and the zoom level that will return a visually similar set of tiles much quicker. It's just a case of tweaking the slope of that formula.

One more point worth noting. Because COUNT itself is slow on large tables there's a trick that gets a fast approximate COUNT:

SELECT reltuples AS ct FROM pg_class WHERE oid = 'tablename'::regclass;
@andybak andybak changed the title Support for TABLESAMPLE Possible optimization for very large tables Jul 13, 2020
@andybak
Copy link
Author

andybak commented Jul 13, 2020

This is probably a good place to mention another optimization I'm using in my custom MVTManager:

def _get_non_geom_columns(self):
    columns = []
    for field in self.model._meta.get_fields():
        if hasattr(field, "get_attname_column"):
            column_name = field.get_attname_column()[1]
            if column_name != self.geo_col and column_name in self.include_columns:
                columns.append(column_name)
    return columns

self.include_columns is set to a list of the columns I want in addition the the geom column. For tables with large numbers of columns this can reduce the amount of data that needs to be transferred considerably.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant