Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model versioning and duplicate prevention #20

Open
herzogrh opened this issue Aug 13, 2024 · 3 comments · Fixed by #70
Open

Model versioning and duplicate prevention #20

herzogrh opened this issue Aug 13, 2024 · 3 comments · Fixed by #70
Assignees
Labels
couldhave "Could have" prioritization for the project enhancement New feature or request

Comments

@herzogrh
Copy link
Member

By logging the version of the model in execution requests and the hashing of input parameters, duplicate execution requests should be noticed. Whenever a second request with the same input parameters is send, the job results should be mirrored and pointed to the previous request.

@herzogrh herzogrh added the enhancement New feature or request label Aug 13, 2024
@herzogrh
Copy link
Member Author

Some models may be non-deterministic, so maybe it'd be a good idea to configure in the providers.yaml that results should not be returned from the cache, but instead they should always be calculated

@herzogrh herzogrh added the couldhave "Could have" prioritization for the project label Aug 19, 2024
@hwbllmnn
Copy link
Collaborator

The problem here is that the parameters are stored in a JSONB field in the database. Computing the hash database wise apparently results in different hashes all the time, presumably because there is no strict order when serializing JSON objects. I've tried a few things (casting input strings to JSONB and then back to text in order to enforce some kind of standard serialization for example) but that didn't help.

An alternative could be to do the hashing python wise in a standard manner, for example by sorting the keys first but that would be more work than to do that db wise.

@herzogrh
Copy link
Member Author

Maybe it's even easier to compare the input parameters directly and not the hash? So if the parameters are the same and the model is configured as deterministic, then the previous job results will be returned.

@hwbllmnn hwbllmnn mentioned this issue Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
couldhave "Could have" prioritization for the project enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants