Change configuration format from JSON to TOML #1224

magula · 2023-02-01T21:41:27Z

In light of #119, this would change the format of the cms.conf and cms.ranking.conf configuration files from JSON to TOML. Apart from the obvious, the changes include:

Transforms the _section headers of the previous JSON format into TOML sections.
The TOML sections of cms.conf correspond to predefined dataclasses in Python, i.e. there is an object for each TOML section that contains its attributes. E.g., keep_sandbox can be accessed as config.worker.keep_sandbox, while previously it would have just been config.keep_sandbox. Some variable names also change: config.admin_cookie_duration becomes config.aws.cookie_duration, and config.cookie_duration becomes config.cws.cookie_duration.
This should structure the attributes better and provide more consistent key names.
The structure of cms.ranking.conf is left untouched, the file is just translated to TOML.
Includes a check if the config file that would be used is in JSON format. If so, the user is warned about the change and advised to tranform the config into the new format. He is also presented with an attempt at translating the file.
This works by injecting values taken from the JSON config into a Jinja template of the bare TOML config.
If an unexpected key is present in the JSON config, the key-value pair is put in a section called stray. When loading a TOML config, everything in stray is stored as a field of Config, so it can still be accessed as config.[keyname].
The automatic translation works only heuristically. The attributes are rendered in the Jinja template using tojson, so there is no guarantee the outcome will adhere to the TOML specification.
One could use a TOML writer instead, but that would add another package dependency.

Note that these changes have not been tested thoroughly.

Any feedback is very welcome!

codecov · 2023-02-01T21:53:28Z

Codecov Report

Attention: Patch coverage is 65.67164% with 46 lines in your changes missing coverage. Please review.

Project coverage is 69.20%. Comparing base (b77c87b) to head (7105fce).
Report is 7 commits behind head on master.

Files with missing lines	Patch %	Lines
cms/conf.py	67.54%	37 Missing ⚠️
cms/io/web_service.py	66.66%	2 Missing ⚠️
cmstestsuite/functionaltestframework.py	33.33%	2 Missing ⚠️
cmstestsuite/testrunner.py	0.00%	2 Missing ⚠️
prerequisites.py	0.00%	2 Missing ⚠️
cmsranking/Config.py	80.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1224      +/-   ##
==========================================
- Coverage   69.39%   69.20%   -0.19%     
==========================================
  Files         328      328              
  Lines       26201    26273      +72     
==========================================
+ Hits        18182    18183       +1     
- Misses       8019     8090      +71

Flag	Coverage Δ
functionaltests	`46.65% <67.46%> (-0.11%)`	⬇️
unittests	`56.68% <57.46%> (-0.12%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

setup.py

wil93

Thank you so much for taking care of this important task!

A thought about the use of sections: I get the desire of organizing keys into related groups, however maybe it would be better to keep all the keys under the same "namespace" as they were in the JSON, as to reduce the necessary changes to the absolute minimum (thus minimizing the risk of new bugs) especially given that we don't have a type system helping us to verify with certainty that all config usages are correctly migrated. Doing this would also allow us to continue supporting JSON for backwards-compatibility (I like the "heuristically generated" TOML version, it will be very helpful to users 😄 however I think that for some time can still support the JSON file when a TOML version is missing).

Also, I remember that some time ago there was some discussion about wanting to move many of the configuration options away from cms.conf and into the database: ideally, the cms.conf file would almost exclusively be used to specify the database connection string, while most of the options would be stored in the database itself (this would make it easier to change the values, e.g. "keep_sandbox" could be a simple checkbox in the AWS panel) and with this scenario in mind, I think it would make more sense to not organize options into sections.

Another thing: maybe we can update the file name to include .toml as the extension? I think there could be some value in indicating this extension (it clearly communicates that it's a TOML file so users, and most importantly their IDE / editors, will expect to find TOML syntax inside of it). For example we could change cms.conf to cms.toml, or if we really want to keep conf we could usecms-conf.toml or cms.conf.toml.

What do you think?

setup.py

magula · 2023-02-03T15:30:59Z

Yes, not grouping the keys into sections, thus changing way less code is definitely safer, and preserving backwards-compatibility is also a big plus. I am indecisive on this, but I think I will change it back then. :)

Would you mind if we still renamed

cookie_duration -> contest_cookie_duration and
num_proxies_used -> contest_num_proxies_used,

but to preserve compatibility used the old names as aliases when loading the config? These changes should be small enough to reasonably test manually.
Also, I would like to rename the "section comment" ScoringService to ProxyService, which seems more fitting. ScoringService only uses one of these attributes to find out whether it is worth to connect to ProxyService because the ranking is enabled.

Should we keep

the attributes' type annotations, and
the check in the config loader for each attribute key if that config attribute exists (up until now, a typo in a key would have gone unnoticed and the default value been used inadvertently)?

Naming the config file cms.toml also seems like a good idea, I will do that. I had only refrained from it to not change even more of the code. 😄

PS: I am not sure I would be happy about moving much of the configuration to the database, actually. We tend to use the same basic configuration for new setups, and then only adapt where it is necessary. This is easily done with a configuration file I can just change and copy to new servers (in fact I have just started keeping these in a git repo). Also, even with our regular setup for our selection process, we usually initialize a new database once a year.
If these things became a part of the database and the way to change them would be via the admin interface, we would have to write down what goes in there and then manually add it, or write a script to load the configuration into the database, which seems overly complicated. So my concerns are similar, I think, to what @lw said in #119.

wil93 · 2023-02-04T19:23:17Z

Would you mind if we still renamed

cookie_duration -> contest_cookie_duration and

num_proxies_used -> contest_num_proxies_used,

but to preserve compatibility used the old names as aliases when loading the config? These changes should be small enough to reasonably test manually.

Yeah that sounds reasonable (I assume these aliases would work both in the legacy JSON file and the new TOML one?)

Should we keep

the attributes' type annotations, and

the check in the config loader for each attribute key if that config attribute exists (up until now, a typo in a key would have gone unnoticed and the default value been used inadvertently)?

The type annotations are a big plus in my opinion 😄

As for the second point, one concern I have is that (if I remember correctly) sometimes for a contest we would remove keys from the configuration file since anyway they were set to the default value, thus creating a "minimal" configuration file that only ovverrides what is strictly necessary. Would this change break such "minimal" configuration files?

PS: I am not sure I would be happy about moving much of the configuration to the database, actually. We tend to use the same basic configuration for new setups, and then only adapt where it is necessary.

That's a great input, I will keep this use-case in mind. I think for now it's fine to continue using a configuration file, and if we move to DB-configuration then we should look into having a file representation anyway to support this use-case (e.g. exporting the settings to a toml file, and being able to restore them)

wil93 · 2023-02-04T19:33:51Z

cms/conf.py

+        legacy_paths = [os.path.join(p, "cms.conf") for p in etc_paths]
+        paths = [os.path.join(p, "cms.toml") for p in etc_paths]


I was thinking that maybe it would make sense here to not differentiate between legacy_paths and normal paths, we could just have paths (with the legacy ones appended at the end) and simply try to read sequentially each one of them: as soon as we find a file that we manage to read we return the configuration found there. We would try to read in JSON basically only when reading in TOML raises some kind of format exception. I think the code would be simpler. What do you think?

I agree. I put the legacy paths first though, in case the new sample config is copied during installation: Otherwise, this sample config would be used inadvertently and the message about the old config file would not appear. So now,

a warning is going to be raised as soon as one of the files can be read in JSON format before one can be read in TOML format, and

this JSON config will then proceed to be loaded.

Let me know if I should change this.

config/cms.ranking.toml.sample

magula · 2023-02-04T20:17:49Z

Yeah that sounds reasonable (I assume these aliases would work both in the legacy JSON file and the new TOML one?)

Yes, they will work in both.

As for the second point, one concern I have is that (if I remember correctly) sometimes for a contest we would remove keys from the configuration file since anyway they were set to the default value, thus creating a "minimal" configuration file that only ovverrides what is strictly necessary. Would this change break such "minimal" configuration files?

It would not enforce keys to be present in the configuration file, so such a minimal configuration file will still work just the same. It just warns about config keys that it does not know about, instead of loading them silently. (A warning on missing keys is also included right now, but I will remove that again when I revert the changes of the config structure.)

That's a great input, I will keep this use-case in mind. I think for now it's fine to continue using a configuration file, and if we move to DB-configuration then we should look into having a file representation anyway to support this use-case (e.g. exporting the settings to a toml file, and being able to restore them)

Sounds good, thanks!

magula · 2023-02-09T19:36:20Z

I removed the section headers (or table headers, as TOML calls it) in the last commit, so the structure is as before.

However, there seems to be no (legible) way to structure core_services and other_services if not as tables, because inline tables can not contain newlines. This is a problem because the top-level table ends with the first table, so nothing after that could be top-level again. Thus I moved these to the end of the TOML file in the last commit. They are now the only parts contained in a table, and they have to be at the end of the file, which seems unelegant. Do you see a better way to do this?

wil93 · 2024-01-18T23:42:38Z

Now that we have pytest as test runner, adding tests should be easier so it would be nice to add a few tests, for example:

a test to make sure that the values from the TOML / JSON file indeed are reflected in the config object
a test to verify the logic of choosing TOML over JSON when available, and falling back to JSON when not available

Let me know if you would be down to add these, if not I can try to spend some time on it when I get free.

The converter puts unknown keys under a section 'stray'. The config parser puts attributes from the 'stray' section at the top level, where JSON keys used to be put. This means that keys not considered in the new config structure should continue working.

Also, small cosmetic cleanup

veluca93 · 2024-10-05T14:54:43Z

cms/conf.py

+        # These keys have been renamed. If the old key name is still used, it
+        # is still regarded.
+        for key in ("cookie_duration", "num_proxies_used"):
+            if key in data:


We should probably print a deprecation warning here.

veluca93 · 2024-10-05T14:55:11Z

cms/conf.py

            setattr(self, key, value)

        return True

+    def _suggest_updated_legacy_config(self, path, legacy_data):
+        logger.error("Legacy json config file found at %s. "


Why is this an error and not a warning?

veluca93 · 2024-10-05T15:00:06Z

requirements.txt

@@ -17,6 +17,7 @@ chardet>=3.0,<3.1  # https://pypi.python.org/pypi/chardet
 babel>=2.6,<2.7  # http://babel.pocoo.org/en/latest/changelog.html
 pyxdg>=0.26,<0.27  # https://freedesktop.org/wiki/Software/pyxdg/
 Jinja2>=2.10,<2.11  # http://jinja.pocoo.org/docs/latest/changelog/
+tomli


This should probably come with a version.

This can now be removed as we upgraded to Python 3.12 which (since 3.11) supports TOML natively: https://docs.python.org/3/library/tomllib.html

magula commented Feb 1, 2023

View reviewed changes

setup.py Outdated Show resolved Hide resolved

wil93 reviewed Feb 2, 2023

View reviewed changes

setup.py Outdated Show resolved Hide resolved

wil93 reviewed Feb 4, 2023

View reviewed changes

config/cms.ranking.toml.sample Outdated Show resolved Hide resolved

magula and others added 13 commits October 5, 2024 15:20

Change config format from JSON to TOML

97a8f83

Add TOML sample configs

4966ef7

Fix collection variable annotations

b7cbebf

Fix test framework

1ce673f

Change 'JSON' to 'TOML' in docs

8a3fb4d

Cleanup JSON to TOML transition helper

5072da1

Rename cms_conf_legacy_mapping.toml(->.jinja)

0980836

Rename cms.conf -> cms.toml

1359385

(Almost) no TOML tables

9977023

Fix TOML loader

c4761c8

Also, small cosmetic cleanup

Minor cleanup

74756e8

Fixup

35e0431

wil93 force-pushed the toml branch from a460387 to 35e0431 Compare October 5, 2024 13:23

veluca93 reviewed Oct 5, 2024

View reviewed changes

Merge branch 'master' into toml

7105fce

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change configuration format from JSON to TOML #1224

Change configuration format from JSON to TOML #1224

magula commented Feb 1, 2023

codecov bot commented Feb 1, 2023 •

edited

Loading

wil93 left a comment

magula commented Feb 3, 2023

wil93 commented Feb 4, 2023

wil93 Feb 4, 2023

magula Feb 9, 2023

magula commented Feb 4, 2023

magula commented Feb 9, 2023

wil93 commented Jan 18, 2024

veluca93 Oct 5, 2024

veluca93 Oct 5, 2024

veluca93 Oct 5, 2024

wil93 Nov 17, 2024

		legacy_paths = [os.path.join(p, "cms.conf") for p in etc_paths]
		paths = [os.path.join(p, "cms.toml") for p in etc_paths]

Change configuration format from JSON to TOML #1224

Are you sure you want to change the base?

Change configuration format from JSON to TOML #1224

Conversation

magula commented Feb 1, 2023

codecov bot commented Feb 1, 2023 • edited Loading

Codecov Report

wil93 left a comment

Choose a reason for hiding this comment

magula commented Feb 3, 2023

wil93 commented Feb 4, 2023

wil93 Feb 4, 2023

Choose a reason for hiding this comment

magula Feb 9, 2023

Choose a reason for hiding this comment

magula commented Feb 4, 2023

magula commented Feb 9, 2023

wil93 commented Jan 18, 2024

veluca93 Oct 5, 2024

Choose a reason for hiding this comment

veluca93 Oct 5, 2024

Choose a reason for hiding this comment

veluca93 Oct 5, 2024

Choose a reason for hiding this comment

wil93 Nov 17, 2024

Choose a reason for hiding this comment

codecov bot commented Feb 1, 2023 •

edited

Loading