Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpineOpt data structure migration #120

Closed
spine-o-bot opened this issue Feb 7, 2021 · 23 comments
Closed

SpineOpt data structure migration #120

spine-o-bot opened this issue Feb 7, 2021 · 23 comments
Assignees
Labels
Type: new feature Add something that doesn't exist yet
Milestone

Comments

@spine-o-bot
Copy link

In GitLab by @manuelma on Apr 2, 2020, 12:03

We have constantly been upgrading the specific data structure for Spine Model over the years, which means a lot of manual adjustments of our working databases. It's frankly a pain that we could avoid if we had some sort of automatic migration system.

We have all the tools to do something like that, but there are some questions:

  • Do we use alembic for this? I would say no, as this doesn't involve any schema migration, just content. Alembic feels a bit overkill.
  • Where do we annotate the version of the structure? I was thinking maybe we can use a normal parameter associated to the model object class.
  • Where does the migration script live? I would say SpineModel.jl. In this way, run_spinemodel could check the version and then run the corresponding migration scripts automatically before building the model.
@spine-o-bot spine-o-bot added the Type: new feature Add something that doesn't exist yet label Feb 7, 2021
@mihlema
Copy link
Contributor

mihlema commented Feb 17, 2021

Could it be an easy way out, to somehow version tag our spineopt_template.json and keep a history of the changes? Ideally, this would also track a message explaining the changes made (e.g. the commit message)

@mihlema mihlema changed the title Spine Model data structure migration SpineOpt data structure migration May 18, 2021
@mihlema
Copy link
Contributor

mihlema commented May 18, 2021

We had some discussion on this during our Project meeting. There can be two types of changes:

  • renaming of a parameter/relationship class/ object class: If this happens, the user will need to adapt the names for those. This can be ~easily done with the data transformer. If we want to we could provide a tool specification for the data transformer for each version, that includes name changes to the previous one. However, I think it's fine if we leave this up to the user as long as we document renamings for each version
  • introducing new (required) relationship classes/ object classes/ parameters: Here, a simple renaming will not be a silver bullet. An example of these structural changes was e.g. when we enforced mandatory model__stochastic_structure, model__temporal_block, node__tmporal_block etc. relationships. The least we have to do is tagging versions properly and again documenting these fundamental changes. The more mature SpineOpt gets, the fewer these fundamental changes.

I think a bare minimum solution could be to have a version_history file that lists these changes. Possibly this could life somewhere in the documentation? For readability this might be convenient. (Alternatively, the other logical place could be alongside the spineopt_template.json)

@mihlema
Copy link
Contributor

mihlema commented Jun 1, 2021

Possible renamings:

@mihlema
Copy link
Contributor

mihlema commented Jun 1, 2021

Additional notes from our call last week:

  • migration script taking care of the renaming
  • always tag versions with changes, and add a release note (x.X.x)
  • store db version in db (either designated object class or metaparameter
  • @warn if SpineOpt template version doesn't coincide with SpineOpt.jl version?

@mihlema mihlema added this to the crunch-time milestone Sep 15, 2021
@mihlema mihlema modified the milestones: crunch-time, Roadmap Sep 16, 2021
@mihlema
Copy link
Contributor

mihlema commented Sep 16, 2021

@manuelma will provide sceleton for renaming in python

@manuelma
Copy link
Collaborator

manuelma commented Sep 16, 2021

import spinedb_api as api

url = "sqlite:///path-to-sqlite-file"
db_map = api.DatabaseMapping(url)
obj_classes = [
    db_map.cache_to_db("object_class", x._asdict())
    for x in db_map.query(db_map.object_class_sq).filter_by(name="unit_constraint")
]
rel_classes = [
    db_map.cache_to_db("relationship_class", x._asdict())
    for x in db_map.query(db_map.wide_relationship_class_sq).filter(
        db_map.wide_relationship_class_sq.c.name.like("%unit_constraint%")
    )
]
for x in obj_classes + rel_classes:
    x["name"] = x["name"].replace("unit_constraint", "user_constraint")
db_map.update_object_classes(*obj_classes)
db_map.update_wide_relationship_classes(*rel_classes)
db_map.commit_session("Rename unit_constraint to user_constraint")
db_map.connection.close()

The above script replaces unit_constraint by user_constraint in object and relationship classes. Is that what we need?

@DillonJ
Copy link
Collaborator

DillonJ commented Sep 16, 2021

Perfect - that's great - and it also serves as a nice template for future scripts. I will test this when I implement the renaming.

I guess we can store these scripts in the repository. Do we have versioning of the data structure yet? What sort of naming system can we use so a user who has been away for a while comes back with their old db and is wondering what scripts they need to run to update their database?

@manuelma
Copy link
Collaborator

manuelma commented Sep 16, 2021

How old is the db? We actually have no idea what's out there since we have done so many template modifications by hand. So I think we need to set on the current template version being version 0. From then on, we can provide automatic migration scripts that also write the version somewhere (a model class parameter value for instance). And then we can provide a tool with the plug-in that migrates any db to the latest version by calling the necessary scripts, depending on the version stored in the db.

@DillonJ
Copy link
Collaborator

DillonJ commented Sep 17, 2021

Regarding the version of the data structure and using parameters etc...

I feel the model object isn't appropriate because 1. there may be multiple model objects and 2, we do actually use the model object for other things.

I was thinking a more general way might be to have a "settings" object class. It would be nice if we could have one object each for different types of settings... e.g. solver_settings, model_settings, toolbox_settings etc... but the problem here is that we have to define parameters at class level meaning it would be possible to define parameter values for certain setting objects that aren't relevant.

Trying to make this as general as possible... there will be SpineOpt level settings (e.g. data structure version) and then settings specific to each model (e.g. optimality gap for operations model and for the master problem.)

I guess for model specific settings we can use parameters on the model class... but what about more general settings that are model independent?

Some options:

  • setting object class with multiple objects, each representing different types of settings (but the parameters are defined at class level meaning you can't make the parameters object specific)
  • setting object class with a single object holding all types of settings
  • multiple setting classes with each with a single object representing different types of settings

Thoughts?

DillonJ added a commit that referenced this issue Sep 17, 2021
with instructions to run migration script.
@DillonJ
Copy link
Collaborator

DillonJ commented Sep 17, 2021

So I have implemented the renaming of unit_constraint to user_constraint.

I have implemented a check in check_data_structure.jl to warn the user if the unit_constraint object class is found. Any unit_constraints will be ignored

The migration script at https://github.com/Spine-project/SpineOpt.jl/blob/master/templates/data_structure_migration_script_0.0-0.1.py can be run to automatically update older model files. Usage is as follows:

python data_structure_migration_script_0.0-0.1.py "///sqlite/C:/my/path/model.sqlite"

Alternatively, one may manually update your database by renaming the unit_constraint object class to user_constraint and all related relationship classes from *__unit_constraint to *__user_constraint

DillonJ added a commit that referenced this issue Sep 17, 2021
@manuelma
Copy link
Collaborator

Thank you @DillonJ nice job!

I'm a bit unsure about providing the script in the repository, it makes it too official. I feel we need to provide a better solution for our users rather than asking them to run a script in a python environment where spinedb_api is installed.

I think run_spineopt should handle template upgrades seamlessly. We could have done it with a little more time. At the moment, we will need to run the script on all the dbs in our case study repositories and update the repo. With an automatic solution we could have saved us all that work. I still feel that the 'crunching' approach is a bit dangerous, but maybe it's fine, I don't know.

@DillonJ
Copy link
Collaborator

DillonJ commented Sep 17, 2021

I agree that it's not perfect. However, to do this more nicely within SpineOpt/SpineInterface requires more API functionality which is what originally motivated spine-tools/SpineInterface.jl#73. I have the feeling that a solution to this is far off and we can't just freeze development.

I don't see an issue with the update script being official - why is that an issue? To use SpineOpt you need to use toolbox and so you need python anyway. Is there a way to make things a little nicer at that end? A check at API level perhaps?

@manuelma
Copy link
Collaborator

I agree we shouldn't freeze development, but slowing down is ok, and rushing might not. I am not saying this is rushing, but how essential is this renaming in the end?

The script being official hints that this is the solution we want to provide in the long run. I don't think it is.

We could spend the time in implementing the necessary api to support db upgrading within julia. It might be slow, true, but we'd get there eventually and we'd have a little bit of a smoother transition.

I think it's ok to bite the bullet now and upgrade all our dbs manually in the case study repos. We have bitten bitter bullets and it's been fun. But for the next template upgrade, we should again try to implement the nice solution we've been discussing about.

@DillonJ
Copy link
Collaborator

DillonJ commented Sep 17, 2021

For me the renaming is important because it's very misleading currently and it;s something we've wanted to do for a long time.

I am also working on including investments variables in user_constraints which would make it even more misleading... and since I'm working on it anyway - I'd prefer to do it now rather than later.

Why do we have to update manually when we have an update script that's trivial to use?

@manuelma
Copy link
Collaborator

Sorry, by "manually" I mean running the script. I would like run_spineopt to do that for me.

@DillonJ
Copy link
Collaborator

DillonJ commented Sep 17, 2021

I wonder is a stop-gap to get the load_template tool to do the migration also?

@manuelma
Copy link
Collaborator

I really believe we were too quick to do this change. We could e.g. start working on the Julia api to do db changes and have something minimal in one week time.

@DillonJ
Copy link
Collaborator

DillonJ commented Sep 17, 2021

We did discuss the python script idea here and in our meetings, you even provided it! :-)

Perhaps we could have something in pre-processing in the meantime? We could copy all the relationships, objects and parameters from unit_constraint to user_constraint?

@manuelma
Copy link
Collaborator

I know! I was onboard but switched opinions at some point :) I realized we have some stuff in public repos that deserve better attention. That thought didn't come to me at the moment of discussing all this. Also I thought you wanted the script to do something else, just for yourself, not to put it out there.

That's why I'm not faulting anybody but maybe myself. I'm now trying to implement the julia API to do automatic migration just in case, but maybe it's already too late.

manuelma pushed a commit that referenced this issue Sep 18, 2021
Re #120

We now support automatic migration using a simple versioning system.
The version is stored in the *default_value* of the `version`
parameter under the `settings` object class. This way we don't need
to mess with multiple `settings` objects and parameter values.

The first migration is also implemented to rename unit_constraint
to user_constraint. This happens automatically when one calls
`run_spineopt(url; upgrade=true)`. After this, the version will be 2
(we assume that 1 was the historic, unversioned version).

So please don't use the python script to update your dbs,
and don't do it manually, otherwise the automatic version detection
and migration will be broken.
@manuelma
Copy link
Collaborator

manuelma commented Sep 18, 2021

I just implemented the automatic migration thing (see above commit and please read the commit message to know the details)

It requires latest spinedb_api and SpineInterface.jl. Please let me know how it goes. I feel next time we need to do this in a branch. Maybe it's not too late to do it that way this time. We can create a branch from current master and then revert everything in master starting from @DillonJ 's commit where we rename unit_constraint to user_constraint in code?

Edit: actually, let's not revert unless things start to go very wrong which I hope won't be the case.

@DillonJ
Copy link
Collaborator

DillonJ commented Sep 20, 2021

The version is stored in the *default_value* of the `version`
parameter under the `settings` object class. This way we don't need
to mess with multiple `settings` objects and parameter values.

Nice... very smart... I like it! :-)

@DillonJ
Copy link
Collaborator

DillonJ commented Sep 22, 2021

Just a quick note to say that I've been using some old DBs recently and this process worked flawlessly!

@manuelma
Copy link
Collaborator

Cool! I want to finally implement support for is_hidden so we can make settings hidden if we want.

@manuelma manuelma closed this as completed Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: new feature Add something that doesn't exist yet
Projects
None yet
Development

No branches or pull requests

4 participants