add canary migration for spike [MRXN23-453] #1508

hotzevzl · 2023-09-19T09:35:35Z

not to be merged - this is only a spike to validate the use of an ad-hoc DataSource to be able to drive apidb alongside geodb from within a TypeORM migration for the geoprocessing service, since we haven't used this setup ever so far.

This will be needed if porting the proposed Python data migration script for existing projects and scenario-specific cost surfaces (#1501).

What this spike wants to achieve is:

validate that the use of an ad-hoc DataSource for apidb works as expected at runtime (this was validated in our Compose-based development setup)
validate that the use of this ad-hoc DataSource works correctly in CI too (because I got lost in the multitude of ways we may run migrations, as part of standing up services, as steps of a script, etc. as well as the multiple ormconfig instances and flavours)

vercel · 2023-09-19T09:35:39Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
marxan	✅ Ready (Inspect)	Visit Preview	Sep 19, 2023 9:35am

hotzevzl · 2023-09-19T09:43:49Z

@alexeh this is just a stub migration to validate that we can reliably work within the boundaries of TypeORM magic and "just" use an apidb DataSource from within geoprocessing migrations.

Asking you to review to make sure I'm not forgetting any use case where we may run migrations through other means and we end up not having the configuration of the apidb connection available...

alexeh · 2023-09-20T03:51:59Z

Thanks @hotzevzl!

Magic seems feasible. However, I'd like to clarify the operational bits of this. When we do the scary data migration we expect:

scenarios.cost_surface_id to be in place (and nullable), which would mean adding a api-side migration
Then run the data migration from geo-side
Then add a last migration to make the relation not nullable and any other additional magic we want to add to the FK.
We could potentially make the relation not nullable from the geoside once the data is migrated.
Follow the overall script idea and create all the relations within geo-migrations, and just use IF NOT EXIST from the api side, as I think we must leave the changes made to api_db reflected in the api migrations. On this line, we should also use IF NOT EXIST guards in the geo-migrations affecting the api-db, if in practice the migrations can run in parallel at some point. We should take in account that the data-migration could take a little longer to commit while the api has performed the changes over the same table.

alexeh

my 2 cents:
#1508 (comment)

hotzevzl · 2023-09-21T17:14:31Z

@alexeh thanks, and yes, even after discussing yesterday with Kevin these concerns you raised about "syncing" migrations from both services and on both databases, I now realise that I hadn't considered the full picture in general, where we don't (currently) have control over which of the api and geoprocessing service has finished running all their migrations.

So the idea of doing all the changes in one single migration in the geoprocessing service goes only to a certain point, if this still relies on earlier schema changes that were part of migrations in the api service.

IF NOT EXISTS and company as you suggest will help, though I'm in general quite contrary to them as they risk papering over accidental inconsistencies between dev environments or even staging and prod in some cases. I'm not altogether opposed to these - if used judiciously and sparingly, I don't see a big risk and they could simplify things that would otherwise require more coordination.

Alternatives I can see could be:

from the geoprocessing service, check via an apidb connection that a specific migration in the apidb has run
enforce that api service is fully healthy (both when running under docker compose and when running in a k8s environment), which would imply it has finished running all its migrations, before starting the geoprocessing service (though in general we may actually want to enforce the opposite, in fact, since api also works as a poor person's API gateway/reverse proxy in front of geoprocessing, so this should in general be fully healthy first 🤷🏼)

thoughts? cc @KevSanchez

hotzevzl · 2023-09-25T12:44:11Z

Closing this as the purpose was to validate the use of different connections, but in general the setup seems like a Very Bad Idea (tm) because we'd now need to coordinate migrations between different services, and in environments as different as staging/prod, local dev and CI, which seems a recipe for disaster.

We'll adjust the approach separately (for example, only run data migrations in controlled environments where we can ensure a specific sequencing of api and geoprocessing migrations).

add canary migration for spike

3a621b0

hotzevzl requested a review from alexeh September 19, 2023 09:44

alexeh reviewed Sep 20, 2023

View reviewed changes

hotzevzl closed this Sep 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add canary migration for spike [MRXN23-453] #1508

add canary migration for spike [MRXN23-453] #1508

hotzevzl commented Sep 19, 2023

vercel bot commented Sep 19, 2023

hotzevzl commented Sep 19, 2023

alexeh commented Sep 20, 2023

alexeh left a comment

hotzevzl commented Sep 21, 2023

hotzevzl commented Sep 25, 2023

add canary migration for spike [MRXN23-453] #1508

add canary migration for spike [MRXN23-453] #1508

Conversation

hotzevzl commented Sep 19, 2023

vercel bot commented Sep 19, 2023

hotzevzl commented Sep 19, 2023

alexeh commented Sep 20, 2023

alexeh left a comment

Choose a reason for hiding this comment

hotzevzl commented Sep 21, 2023

hotzevzl commented Sep 25, 2023