-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change configuration format from JSON to TOML #1224
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1224 +/- ##
==========================================
- Coverage 69.39% 69.20% -0.19%
==========================================
Files 328 328
Lines 26201 26273 +72
==========================================
+ Hits 18182 18183 +1
- Misses 8019 8090 +71
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for taking care of this important task!
A thought about the use of sections: I get the desire of organizing keys into related groups, however maybe it would be better to keep all the keys under the same "namespace" as they were in the JSON, as to reduce the necessary changes to the absolute minimum (thus minimizing the risk of new bugs) especially given that we don't have a type system helping us to verify with certainty that all config usages are correctly migrated. Doing this would also allow us to continue supporting JSON for backwards-compatibility (I like the "heuristically generated" TOML version, it will be very helpful to users 😄 however I think that for some time can still support the JSON file when a TOML version is missing).
Also, I remember that some time ago there was some discussion about wanting to move many of the configuration options away from cms.conf
and into the database: ideally, the cms.conf
file would almost exclusively be used to specify the database connection string, while most of the options would be stored in the database itself (this would make it easier to change the values, e.g. "keep_sandbox" could be a simple checkbox in the AWS panel) and with this scenario in mind, I think it would make more sense to not organize options into sections.
Another thing: maybe we can update the file name to include .toml
as the extension? I think there could be some value in indicating this extension (it clearly communicates that it's a TOML file so users, and most importantly their IDE / editors, will expect to find TOML syntax inside of it). For example we could change cms.conf
to cms.toml
, or if we really want to keep conf
we could usecms-conf.toml
or cms.conf.toml
.
What do you think?
Yes, not grouping the keys into sections, thus changing way less code is definitely safer, and preserving backwards-compatibility is also a big plus. I am indecisive on this, but I think I will change it back then. :) Would you mind if we still renamed
but to preserve compatibility used the old names as aliases when loading the config? These changes should be small enough to reasonably test manually. Should we keep
Naming the config file PS: I am not sure I would be happy about moving much of the configuration to the database, actually. We tend to use the same basic configuration for new setups, and then only adapt where it is necessary. This is easily done with a configuration file I can just change and copy to new servers (in fact I have just started keeping these in a git repo). Also, even with our regular setup for our selection process, we usually initialize a new database once a year. |
Yeah that sounds reasonable (I assume these aliases would work both in the legacy JSON file and the new TOML one?)
The type annotations are a big plus in my opinion 😄 As for the second point, one concern I have is that (if I remember correctly) sometimes for a contest we would remove keys from the configuration file since anyway they were set to the default value, thus creating a "minimal" configuration file that only ovverrides what is strictly necessary. Would this change break such "minimal" configuration files?
That's a great input, I will keep this use-case in mind. I think for now it's fine to continue using a configuration file, and if we move to DB-configuration then we should look into having a file representation anyway to support this use-case (e.g. exporting the settings to a toml file, and being able to restore them) |
cms/conf.py
Outdated
legacy_paths = [os.path.join(p, "cms.conf") for p in etc_paths] | ||
paths = [os.path.join(p, "cms.toml") for p in etc_paths] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that maybe it would make sense here to not differentiate between legacy_paths
and normal paths
, we could just have paths
(with the legacy ones appended at the end) and simply try to read sequentially each one of them: as soon as we find a file that we manage to read we return the configuration found there. We would try to read in JSON basically only when reading in TOML raises some kind of format exception. I think the code would be simpler. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I put the legacy paths first though, in case the new sample config is copied during installation: Otherwise, this sample config would be used inadvertently and the message about the old config file would not appear. So now,
- a warning is going to be raised as soon as one of the files can be read in JSON format before one can be read in TOML format, and
- this JSON config will then proceed to be loaded.
Let me know if I should change this.
Yes, they will work in both.
It would not enforce keys to be present in the configuration file, so such a minimal configuration file will still work just the same. It just warns about config keys that it does not know about, instead of loading them silently. (A warning on missing keys is also included right now, but I will remove that again when I revert the changes of the config structure.)
Sounds good, thanks! |
I removed the section headers (or table headers, as TOML calls it) in the last commit, so the structure is as before. However, there seems to be no (legible) way to structure |
Now that we have pytest as test runner, adding tests should be easier so it would be nice to add a few tests, for example:
Let me know if you would be down to add these, if not I can try to spend some time on it when I get free. |
The converter puts unknown keys under a section 'stray'. The config parser puts attributes from the 'stray' section at the top level, where JSON keys used to be put. This means that keys not considered in the new config structure should continue working.
Also, small cosmetic cleanup
# These keys have been renamed. If the old key name is still used, it | ||
# is still regarded. | ||
for key in ("cookie_duration", "num_proxies_used"): | ||
if key in data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably print a deprecation warning here.
setattr(self, key, value) | ||
|
||
return True | ||
|
||
def _suggest_updated_legacy_config(self, path, legacy_data): | ||
logger.error("Legacy json config file found at %s. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this an error and not a warning?
@@ -17,6 +17,7 @@ chardet>=3.0,<3.1 # https://pypi.python.org/pypi/chardet | |||
babel>=2.6,<2.7 # http://babel.pocoo.org/en/latest/changelog.html | |||
pyxdg>=0.26,<0.27 # https://freedesktop.org/wiki/Software/pyxdg/ | |||
Jinja2>=2.10,<2.11 # http://jinja.pocoo.org/docs/latest/changelog/ | |||
tomli |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably come with a version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can now be removed as we upgraded to Python 3.12 which (since 3.11) supports TOML natively: https://docs.python.org/3/library/tomllib.html
In light of #119, this would change the format of the
cms.conf
andcms.ranking.conf
configuration files from JSON to TOML. Apart from the obvious, the changes include:Transforms the
_section
headers of the previous JSON format into TOML sections.The TOML sections of
cms.conf
correspond to predefineddataclasses
in Python, i.e. there is an object for each TOML section that contains its attributes. E.g.,keep_sandbox
can be accessed asconfig.worker.keep_sandbox
, while previously it would have just beenconfig.keep_sandbox
. Some variable names also change:config.admin_cookie_duration
becomesconfig.aws.cookie_duration
, andconfig.cookie_duration
becomesconfig.cws.cookie_duration
.This should structure the attributes better and provide more consistent key names.
The structure of
cms.ranking.conf
is left untouched, the file is just translated to TOML.Includes a check if the config file that would be used is in JSON format. If so, the user is warned about the change and advised to tranform the config into the new format. He is also presented with an attempt at translating the file.
This works by injecting values taken from the JSON config into a Jinja template of the bare TOML config.
If an unexpected key is present in the JSON config, the key-value pair is put in a section called
stray
. When loading a TOML config, everything instray
is stored as a field ofConfig
, so it can still be accessed asconfig.[keyname]
.The automatic translation works only heuristically. The attributes are rendered in the Jinja template using
tojson
, so there is no guarantee the outcome will adhere to the TOML specification.One could use a TOML writer instead, but that would add another package dependency.
Note that these changes have not been tested thoroughly.
Any feedback is very welcome!