Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add canary migration for spike [MRXN23-453] #1508

Conversation

hotzevzl
Copy link
Member

not to be merged - this is only a spike to validate the use of an ad-hoc DataSource to be able to drive apidb alongside geodb from within a TypeORM migration for the geoprocessing service, since we haven't used this setup ever so far.

This will be needed if porting the proposed Python data migration script for existing projects and scenario-specific cost surfaces (#1501).

What this spike wants to achieve is:

  • validate that the use of an ad-hoc DataSource for apidb works as expected at runtime (this was validated in our Compose-based development setup)
  • validate that the use of this ad-hoc DataSource works correctly in CI too (because I got lost in the multitude of ways we may run migrations, as part of standing up services, as steps of a script, etc. as well as the multiple ormconfig instances and flavours)

@vercel
Copy link

vercel bot commented Sep 19, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
marxan ✅ Ready (Inspect) Visit Preview Sep 19, 2023 9:35am

@hotzevzl
Copy link
Member Author

@alexeh this is just a stub migration to validate that we can reliably work within the boundaries of TypeORM magic and "just" use an apidb DataSource from within geoprocessing migrations.

Asking you to review to make sure I'm not forgetting any use case where we may run migrations through other means and we end up not having the configuration of the apidb connection available...

@hotzevzl hotzevzl requested a review from alexeh September 19, 2023 09:44
@alexeh
Copy link
Collaborator

alexeh commented Sep 20, 2023

Thanks @hotzevzl!

Magic seems feasible. However, I'd like to clarify the operational bits of this. When we do the scary data migration we expect:

  1. scenarios.cost_surface_id to be in place (and nullable), which would mean adding a api-side migration
    Then run the data migration from geo-side
    Then add a last migration to make the relation not nullable and any other additional magic we want to add to the FK.
    We could potentially make the relation not nullable from the geoside once the data is migrated.

  2. Follow the overall script idea and create all the relations within geo-migrations, and just use IF NOT EXIST from the api side, as I think we must leave the changes made to api_db reflected in the api migrations. On this line, we should also use IF NOT EXIST guards in the geo-migrations affecting the api-db, if in practice the migrations can run in parallel at some point. We should take in account that the data-migration could take a little longer to commit while the api has performed the changes over the same table.

Copy link
Collaborator

@alexeh alexeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my 2 cents:
#1508 (comment)

@hotzevzl
Copy link
Member Author

@alexeh thanks, and yes, even after discussing yesterday with Kevin these concerns you raised about "syncing" migrations from both services and on both databases, I now realise that I hadn't considered the full picture in general, where we don't (currently) have control over which of the api and geoprocessing service has finished running all their migrations.

So the idea of doing all the changes in one single migration in the geoprocessing service goes only to a certain point, if this still relies on earlier schema changes that were part of migrations in the api service.

IF NOT EXISTS and company as you suggest will help, though I'm in general quite contrary to them as they risk papering over accidental inconsistencies between dev environments or even staging and prod in some cases. I'm not altogether opposed to these - if used judiciously and sparingly, I don't see a big risk and they could simplify things that would otherwise require more coordination.

Alternatives I can see could be:

  • from the geoprocessing service, check via an apidb connection that a specific migration in the apidb has run
  • enforce that api service is fully healthy (both when running under docker compose and when running in a k8s environment), which would imply it has finished running all its migrations, before starting the geoprocessing service (though in general we may actually want to enforce the opposite, in fact, since api also works as a poor person's API gateway/reverse proxy in front of geoprocessing, so this should in general be fully healthy first 🤷🏼)

thoughts? cc @KevSanchez

@hotzevzl
Copy link
Member Author

Closing this as the purpose was to validate the use of different connections, but in general the setup seems like a Very Bad Idea (tm) because we'd now need to coordinate migrations between different services, and in environments as different as staging/prod, local dev and CI, which seems a recipe for disaster.

We'll adjust the approach separately (for example, only run data migrations in controlled environments where we can ensure a specific sequencing of api and geoprocessing migrations).

@hotzevzl hotzevzl closed this Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants