Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post-upgrade things we should include #10

Open
justinclift opened this issue Aug 10, 2023 · 12 comments
Open

Post-upgrade things we should include #10

justinclift opened this issue Aug 10, 2023 · 12 comments

Comments

@justinclift
Copy link
Member

Looking at the Release Notes for today's PostgreSQL point release, it says people using BRIN indexes may need to run a re-index:

If you use BRIN indexes to  look up `NULL` values, you will need to reindex 
them after upgrading to this release. On PostgreSQL 12 and above, you can use
`REINDEX CONCURRENTLY` to avoid blocking writes to the affected index and table,
for example:

REINDEX INDEX CONCURRENTLY your_index_name;

Previous PostgreSQL releases (both major and minor) will probably also have similar things that need doing.

It would make sense for us to try and detect situations where these post-upgrade tasks need doing. At the very least we could attempt to alert the database admin of the need (for their database), though preferably we'd automate the task itself to keep this being "automatic".

@justinclift
Copy link
Member Author

And yep, this is likely opening a can of worms. 😉

@justinclift
Copy link
Member Author

Checking back over the older PG 15.x announcements:

So it looks like for upgrades internal to the PG 15.x series, then only the reindex of BRIN indexes is needed. And only if they're indexing NULL values.

@justinclift
Copy link
Member Author

justinclift commented Aug 17, 2023

PostgreSQL SQL that returns a list of the BRIN indexes in a given database:

SELECT class.relname
FROM pg_class AS class
  JOIN pg_index idx on idx.indexrelid = class.oid
  JOIN pg_am access_method ON access_method.oid = class.relam
WHERE access_method.amname = 'brin';

Not (yet) sure if there's a way to figure out whether a BRIN index is indexing NULLs or not. Probably need to investigate that, although BRIN indexes should be pretty fast & cheap to recreate anyway.


So we could probably run that once per database, then regenerate the brin indexes for all of the databases on the system.

We also need to keep track of what version of PostgreSQL binaries were last being run with, so we know to only do this once upon running PG 15.4 for the first time.

I'm not (yet) aware of any existing place that stores this info. The PG_VERSION file only contains the major PG version, not the minor ones.

We might need to start storing this data ourselves, either on the filesystem (DATA_DIR/pgautoupgrade/something maybe?), or perhaps in a new pgautoupgrade database or schema.

If we go with the pgautoupgrade database or schema, it would give a bit more flexibility as it would easily allow for also storing things like upgrade history, etc.

But, adding that in could also have some downsides. Some software is likely very strongly bound to the database structure, so seeing new schema/structure could muck it up.

Probably safer (for now at least) to just create a "pgautoupgrade" directory inside the data directory, and store any persistent info we need there. Can also put upgrade history, error logs, etc there too I guess.

@justinclift
Copy link
Member Author

As a reasonable starting point, we've merged #10 which automatically runs vacuum analyze and does database reindexing after each upgrade.

@andyundso pointed out in that PR that extensions can also need upgrading:

I also read through crunchydata.com/blog/examining-postgres-upgrades-with-pg_upgrade and there they suggested that extensions sometimes also require their own upgrade.

So we should investigate doing that automatically too, as it might turn out to be possible as well. 😄

@justinclift
Copy link
Member Author

justinclift commented Oct 4, 2024

From that article, the pg_catalog.pg_available_extensions table can be used to retrieve the list of extensions in a database that need to be updated:

SELECT name FROM pg_catalog.pg_available_extensions WHERE default_version <> installed_version;

This is a "per database" thing, so we'll need to loop through the extensions installed in each database and update them all.

Running the update for a given extension seems straight forward:

ALTER EXTENSION foobar UPDATE;

The PG docs explaining extension packaging and extension updates are here:

https://www.postgresql.org/docs/current/extend-extensions.html#EXTEND-EXTENSIONS-UPDATES


For our purposes we'll probably need to add some sample database (that uses extensions) to our git repo and use that for testing.

As we test across a bunch of PostgreSQL versions, we'll probably want to use extensions that have been around since the 9.5 release. Probably some of the built in ones, and maybe one or two of the more popular 3rd party ones (PostGIS?).

@justinclift
Copy link
Member Author

Started putting together a PR for the extension upgrade bit: #53

It's just stub code for now, and is likely to need a bunch of work to get the CI testing for it happening.

@justinclift
Copy link
Member Author

Hmmm, pg_upgrade can apparently also generate a file called update_extensions.sql for upgrading extensions. Probably wouldn't hurt to look over that and consider if using that might be the better approach.

@andyundso
Copy link
Member

For our purposes we'll probably need to add some sample database (that uses extensions) to our git repo and use that for testing.

I also had a similar thought recently: From what I see right now, Redash likely creates an empty database. What would make the testing a bit more complete would be having a simple SQL file that contains columns with different data types, couple of indexes and a few rows of data. We read in that SQL file when starting up the original PG version (e.g. v11), so pg_upgrade actually has some work to do. We could potentially drop redash.

I am not sure if this goes too far: Right now we just check if the upgrade works by reading the PG_VERSION file. The actual upgrade process, like whatever bits and bytes have to be changed to make a v11 database a v17 database is implementation detail of pg_upgrade. But if we add a SQL file which has various different columns and data in it, we kind of tap into that territory where we partially test the internal implementation. Which is not a necessary bad thing, but just something to consider.

As we test across a bunch of PostgreSQL versions, we'll probably want to use extensions that have been around since the 9.5 release. Probably some of the built in ones, and maybe one or two of the more popular 3rd party ones (PostGIS?).

agreed on the PG extensions. Testing 3rd party extensions manually would be good prior to merging the PR, but I would not test it automated. It would require us to build a different pgautoupgrade image, containing PostGIS for example, but we would not push this image. The beauty about the current automated testing is that the artifact that will be pushed to Docker Hub is also the one that gets tested on the CI.

Hmmm, pg_upgrade can apparently also generate a file called update_extensions.sql for upgrading extensions. Probably wouldn't hurt to look over that and consider if using that might be the better approach.

Would definitely make our lifes easier 😀

@justinclift
Copy link
Member Author

From what I see right now, Redash likely creates an empty database.

Yeah. Currently it creates the initial (empty) tables and associated structures needed for a brand new Redash install.

We'll probably be better off using some kind of pre-populated database that (like you mention) we can do more in depth automatic testing on after an upgrade.

Testing 3rd party extensions manually would be good prior to merging the PR, but I would not test it automated. It would require us to build a different pgautoupgrade image ...

That's a good point. Hadn't thought of that. 😄

We'll probably need to keep the extension testing limited to the bundled PG extensions then. There's a bunch there, so it'll likely just mean looking through the list of possibilities and picking something reasonable (along with test data).

I'll probably take a look at the update_extensions.sql thing in the next few days days if no-one else gets to it first.

While we're at it, we should probably look over the delete_old_cluster.sh that pg_upgrade generates as well and see if running that would also be useful.

@andyundso
Copy link
Member

sounds good, looking forward to it :)

@andyundso
Copy link
Member

@justinclift I modified the test script locally to load in a "pg-ified" version of the AdventureWorks example database by Microsoft (credits to https://github.com/lorint/AdventureWorks-for-Postgres).

I initially wanted to write an example database myself, but this was too much work. I looked at other example Postgres databases, but it seemed to me that only AdventureWorks actually contains some good data. Downside however is that the SQL file is 85 MB uncompressed (14 MB compressed). I am just wondering if it is a smart idea to add such a large file to a Git repository. I personally think "probably yes", since I see a benefit for our testing by having a decently sized example database.

@justinclift
Copy link
Member Author

justinclift commented Oct 17, 2024

An alternative approach would be to fork that repo into our organisation (so we can make sure it doesn't disappear one day), then use curl or similar to grab the file at the start of test runs.

That being said, 14MB compressed is not exactly a large file. But if we need to modify it (ie add new extensions) a few times, it'll start multiplying in size in the repo.

Maybe lets try the fork-and-curl approach first, or something along those lines, and we can see how well that works in practise?

Also, thanks for taking the time to investigate this stuff. 😁 I'm currently buried in tasks that although I'm getting through, are taking longer that I'd like. 😬

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants