-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better Parameter Validation #80
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
Note: The build failed because it is on Python 3.8, and I used the more robust type hinting conventions that were introduced in Python 3.9. Namely, in Python 3.8 list type hints have to be declared like this: from typing import List
value: List[float] = [1.0, 2.0] While in Python 3.9 the type hinting can be done like this: value: list[float] = [1.0, 2.0] I think the second is much more natural and requires us to think less, which is an additional reason why I will advocate updating to a newer version of Python. It won't break backward compatibility and just gives us new features. |
@PatrickRWells, thanks a lot for the PR! You can change the Python version in this file as part of this PR. |
@PatrickRWells @ajshajib I am a bit worried about the need for requiring 3.9+ Python version. 3.8 should be a long-time supported version and many (or some) machines (be it personal computers or cluster environment) may still run on 3.8. This would narrow down the user base or cause some hassle for some. |
@sibirrer @ajshajib I must admit I really don't see this being an issue when virtual environments exist. I have never met an astronomer who doesn't use Anaconda. Virtual environment support is ubiquitous on clusters. It's not a hassle. Also, Python 3.8 is scheduled for end-of-life next year. It is not a "long-term supported" version. Major libraries will start to drop support (numpy already has), as will major operating systems (without virtual environments of course). We really should not be building new software with a tool that is about to be abandoned when perfectly good alternatives exist that require virtually no overhead. |
I see @PatrickRWells. I am fine with having a >=3.9 requirement. You can just not assume that all users follow the guidelines and then be too upset if their experience is impacted. At the end it is on the developers to provide a solution to the users. I tend to always favor better backwards compatibility in case the implemented features do not suffer too much and it is not necessary |
Hi @PatrickRWells , thanks for the PR. You said that this only works for init method. How straightforward is it to extend for other parts of the code? It would be nice if you can write test functions for your implementation. |
@sibirrer I think if we're building software with a long-term lifetime in mind, we should try to remain compatible with up-to-date versions of our biggest dependencies. Both numpy and astropy have already dropped support for version 3.8, so we should do the same. Otherwise we run the risk down the line of running into problems (due to bugs or anything else) that are unfixable because we're using an old version, and miss out on updates that may meaningfully improve our experience. I think those teams probably understand the Python ecosystem well enough to make good calls on how much backward compatibility is the right amount. It's also worth mentioning that one of the current big focuses of the Python team is performance, which is a critical aspect of our future work on LSST scale. 3.11 is roughly 30% faster than 3.10, in some applications. It will probably be wise for us to run our code in modern environments anyway. Fortunately or unfortunately, Python is a language under active development, and new versions are expected to be released approximately once a year. I think this is a good thing in the long term, because the language gets meaningfully better. It does require some work on the part of us a developers if we really care about building tools that work for the long term, but that's just part of the deal. Changes I will make to the PR:
Some things we should consider:
|
@nkhadka21 Functionally it's not so hard. Right now, the only thing preventing it from working with other object methods is an explicit check that raises an exception if the decorator is not being used on an __init__ method, and the fact that I haven't designed a standard for creating pydantic models for these methods. However this should be a fairly easy change on the back end for other object methods. Extending it to work with functions is a bit more work, but should be doable. The tricky part here is developer experience. If we're not careful, we run the risk of our "params.py" file getting big and unwieldy, which largely defeats the purpose. One possible approach is to put defaults directly in the same file as the classes/functions. I am happy to think a bit more about this and get back to you. In essence I'm really trying to build functionality that encourages us to be serious about good software engineering practices, but doesn't get in the way too much. I think the long-term best solution is to combine this approach with a much tighter API. Right now, this package is mostly a bunch of classes we have to import directly. This is normally how things start off, but we should consider sitting down and really try to properly design an API at some point. That will allow us to control more easily how information flows through the package, and place checks where it makes the most sense to do so and they are most likely to catch mistakes. This is a big job, but it's the kind of stuff I enjoy working on and I'm happy to do so here. But we should be having conversations about this as a group to make sure we understand how the library is expected to be used. Changes I will make to this PR:
|
- We now check what kind of function is being inputted - Introduce basic tests
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Updates have been made based on discussion above. Here's a summary Small changes:
Bigger Changes:
Possible Future Changes: One issue with this approach is that the LSP loses access to what the parameters of the functions are when they are checked. As a result, the popup help in tools like vscode looks like this Note that the popup includes the docstring, but does not include information about the parameters and their types in the standard way. What we want is this: This is possible to fix but requires python 3.10+. I would still advocate for moving to 3.10 for other reasons, but we can save this for another time. |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #80 +/- ##
==========================================
+ Coverage 78.70% 79.56% +0.86%
==========================================
Files 26 30 +4
Lines 1235 1385 +150
==========================================
+ Hits 972 1102 +130
- Misses 263 283 +20
|
Hey all,
As promised, here is an implemented system for managing parameter defaults and validating input values. This type of checking is critical for long-running pipelines because it catches mistakes very early in a run and provides a very clear explanation of what is wrong. We want to place clear constraints on what types a parameter can be, what values it can take, and what its defaults are. Importantly, we also want to know exactly where all this information is, so we can update it when needed. This system uses pydantic for parameter validation, which is an industrial-strength standard library for this kind of stuff.
Note: This pull request is also intended as basic documentation for people contributing to slsim, so it'll be fairly long.
The system I've built here is very easy to use, but does require a bit of explanation. Currently, it only works (by design) with inputs to __init__ methods on classes but we can always update that in the future. Importantly, it can be implemented on a per-class basis.
To mark the class for validation, we simply decorate it with the @param_check decorator:
slsim/SomeModule/some_class.py
Then, create a subfolder called "_params". Within that subfolder, create a file with the same name as the file that contains the class
slsim/SomeModule/_params/some_class.py
The first class defines the validation logic for parameters that go into the init function of SomeClass. Note that the name of the validator class is also SomeClass. This is required This will enforce the following constraints on the parameters:
The second class validates the inputs to the "SomeClass.calculate_something method." The name of the validation class in this case must always be "ClassName_MethodName," but other than that everything is identical. This method only has a single argument, which must be a positive floating point number.
We can also validate inputs to functions this way. For example:
slsim/SomeModule/some_functionality.py
Then we add a validation class
slsim/SomeModule/_params/some_functionality.py
As expected, the name of the validation class must match the name of the decorated function.
Important note: All of this is entirely transparent to the user. They simply create an instance of
SomeClass
as they normally would. They will only run into an issue if the parameters they provide are invalid.All validation is done by
pydantic
, which is incredibly fast. I've updated theGaussianMixtureModel
class and thevel_disp_composite_model
function as an example. The validation class forGaussianMixtureModel
looks like this:slsim/ParamDistributions/_params/GaussianMixtureModel.py
Note: I just pulled the defaults from the original class definition.
This does three things in addition to simple type checking:
Some of these were already implemented in the GaussianMixtureModel __init__ function. However, the checking is now much more robust and requires much less code. GaussianMixtureModel now just looks like this
slsim/ParamDistributions/gaussian_mixture_model.py
Here's an example of a failure case:
Where we see it is telling us that our three input arrays have to be the same length.
Additional notes:
astropy.units.
This will require some additional code, but I've basically already written it forcosmap
so it will be straightforward.float | list[float]
whereas 3.8 would requireUnion[float, list[float]]
and an additional import. Performance has also been a big focus of more recent Python versions, so we're losing out on all that if we stay on 3.8. With virtual environments, changing Python versions is trivial.