-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support having 2 tasks with the same name on DB #998
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #998 +/- ##
==========================================
- Coverage 65.1% 45.2% -19.91%
==========================================
Files 231 231
Lines 17746 17763 +17
==========================================
- Hits 11554 8030 -3524
- Misses 6192 9733 +3541
Continue to review full report at Codecov.
|
Are you serious? Did you really not even try to fix all the things that would break because of this? Here are just the first ones that came to my mind: AddTestcases, AddStatement, ImportTask, RemoveTask, RWS, ... |
I searched a bit more and also ImportContest and ImportDataset would break. |
I assume by "will break" you mean "will print a SQL integrity error instead of a user-friendly error message, in some cases" (especially given the fact that all workflows used until now, including the test suite, are working with no problems). Anyway, OK, I will look into adding more user-friendly error messages. It must be said though that there are some situations where you can get this kind of SQL error printouts already (if you edit tasks in the wrong way, using AWS ...) so I would tend to assume that "contest admins" should mostly be fine with such errors. |
No, by "break" I mean it will become impossible to add a statement to a
task if there is another one with the same name in another contest, or that
the wrong task will be removed, or that a dataset will be added to the
wrong task, or that data will be overwritten. In short, tools will not do
what they're supposed to. Like, they will break.
…On Thu, Aug 23, 2018, 19:00 William Di Luigi ***@***.***> wrote:
I assume by "will break" you mean "will print a SQL integrity error
instead of a user-friendly error message, in some cases" (especially given
the fact that all workflows used until now, including the test suite, are
working with no problems).
Anyway, OK, I will look into adding more user-friendly error messages. It
must be said though that there are some situations where you can get this
kind of SQL error printouts already (if you edit tasks in the wrong way,
using AWS ...) so I would tend to assume that "contest admins" should
mostly be fine with such errors.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#998 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAHX6oG40u0cpyVt0mxK384_5pwV7zzMks5uTn01gaJpZM4WICRa>
.
|
The list I gave you earlier is just all the tools that issue SELECT queries
filtering by task name without also filtering by contest. All these queries
will stop working the way they're expected and need to be fixed. Since
there's small differences in behavior whether a query is fetched with
first(), one() or scalar() (and that now I'm not at my computer) I can't
really tell you which tool breaks in which way.
And then there's RWS, which won't be fun to fix.
…On Thu, Aug 23, 2018, 19:16 Luca Wehrstedt ***@***.***> wrote:
No, by "break" I mean it will become impossible to add a statement to a
task if there is another one with the same name in another contest, or that
the wrong task will be removed, or that a dataset will be added to the
wrong task, or that data will be overwritten. In short, tools will not do
what they're supposed to. Like, they will break.
On Thu, Aug 23, 2018, 19:00 William Di Luigi ***@***.***>
wrote:
> I assume by "will break" you mean "will print a SQL integrity error
> instead of a user-friendly error message, in some cases" (especially given
> the fact that all workflows used until now, including the test suite, are
> working with no problems).
>
> Anyway, OK, I will look into adding more user-friendly error messages. It
> must be said though that there are some situations where you can get this
> kind of SQL error printouts already (if you edit tasks in the wrong way,
> using AWS ...) so I would tend to assume that "contest admins" should
> mostly be fine with such errors.
>
> —
> You are receiving this because you commented.
>
>
> Reply to this email directly, view it on GitHub
> <#998 (comment)>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AAHX6oG40u0cpyVt0mxK384_5pwV7zzMks5uTn01gaJpZM4WICRa>
> .
>
|
Oh, now I see, good points. I guess I'll replace most of "task name" requests with "task id"...would that work? These are admin tools anyway so we can assume an admin knows how to find out the ID of a task. We could eventually add some scripts to ease data browsing from the command line ( |
I would prefer a disambiguation (at least via command line argument, possibly interactive too) to avoid breaking existing scripts. |
OK, I will add an optional |
[Note this might not cover all issues, in some other places you might need to do something different...] |
I would prefer a |
As for RWS, it may be enough to just fix the PS side. PS should keep the |
Also, in AWS there are some views where we list all tasks, for all contests, by name. We should now add their contests' names as well, to allow the admins to select without ambiguity. |
I think the id of the task must be specified in some cases. There can be 10 tasks with the same name and not tied to any contest |
OK, now almost everything works. Soon I will fix the rest.
And this is when I specify an ID:
|
Ready to review again? |
Well it's not finished yet but the tools should work. AWS needs some aesthetic work but it's not broken. I didn't look into RWS yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, 6 of 7 files at r2, 1 of 1 files at r4.
Reviewable status: 8 of 9 files reviewed, 18 unresolved discussions (waiting on @wil93)
a discussion (no related file):
The unit tests for ImportContest, ImportTask, ImportDataset and AddStatement are failing, please fix them.
a discussion (no related file):
In some places you switched some functions from doing their own logging of errors and returning True
/False
to them raising an exception with the error message. I prefer the latter too and think it's good to do it, but if so please do it everywhere (or nowhere) consistently and do it in a separate commit for clarity.
a discussion (no related file):
In a couple of cases you changed the behavior of the tools. That is probably what is causing some of the test failures. Please don't do that. If all tasks in the database just happen to all have distinct names, the tools should keep working exactly like they did before. If that is not the case, the tools should require the least possible information necessary to disambiguate.
cmscontrib/AddStatement.py, line 47 at r2 (raw file):
def add_statement(task_name, task_id, language_code, statement_file,
Add contest_name
.
Provide a None
default for both contest_name
and task_id
(which means moving them to the end of the signature).
cmscontrib/AddStatement.py, line 103 at r2 (raw file):
help="absolute/relative path of statement file") parser.add_argument("-t", "--task-id", action="store", type=int, help="optional task ID used for disambiguation")
Also add a --contest-name
, -c
option.
cmscontrib/AddStatement.py, line 120 at r2 (raw file):
return 1 return 0 if success is True else 1
Does add_statement
ever return False
now? Isn't the return value in fact meaningless? This could be changed to just return 0
.
cmscontrib/AddSubmission.py, line 166 at r2 (raw file):
description="Adds a submission to a contest in CMS.") parser.add_argument("-c", "--contest-id", action="store", type=int, help="id of contest where to add the user")
If you feel like adding --contest-name
and --task-id
here too it wouldn't hurt I guess.
cmscontrib/AddTestcases.py, line 50 at r2 (raw file):
def add_testcases(archive, input_template, output_template, task_name, task_id, dataset_description=None,
Same here: allow contest_name
.
cmscontrib/ImportContest.py, line 205 at r2 (raw file):
tasks = session.query(Task).filter(Task.name == taskname).all() if self.import_tasks:
You changed the behavior of the function. In this case you import the task without any checks and later you unconditionally bind it to a contest. An exception (of the wrong type) could be raised.
cmscontrib/ImportDataset.py, line 56 at r5 (raw file):
class DatasetImporter(object): def __init__(self, path, description, loader_class, task_id):
Same here (contest_name
).
cmscontrib/importing.py, line 68 at r2 (raw file):
def task_from_db(session, task_name, task_id):
Add contest_name
too, between task_name
and task_id
.
Also make contest_name
and task_id
by default None
.
cmscontrib/importing.py, line 72 at r2 (raw file):
session (Session): SQLAlchemy session to use. task_name (string|None): the name of the task, or None to return None.
s/string/str/ (not your change, but as we're moving to py3 we're slowly fixing them wherever we find them)
cmscontrib/importing.py, line 77 at r2 (raw file):
return (Task|None): None if task_name is None, or the task. raise (ImportDataError): if there is no task with the given name. raise (AmbiguousTaskName): if the task name is ambiguous.
?
cmscontrib/importing.py, line 84 at r2 (raw file):
tasks = session.query(Task).filter(Task.name == task_name)
Also filter by contest_name
.
cmscontrib/importing.py, line 88 at r2 (raw file):
tasks = tasks.filter(Task.id == task_id) tasks = tasks.all()
Use .one()
instead (which should be more efficient) and detect the two cases of the if
below by catching the two exceptions it can raise. See
http://docs.sqlalchemy.org/en/latest/orm/query.html#sqlalchemy.orm.query.Query.one
cmscontrib/ImportTask.py, line 69 at r2 (raw file):
def __init__(self, path, prefix, override_name, update, no_statement, contest_id, task_id, loader_class):
Same here.
cmscontrib/ImportTask.py, line 142 at r2 (raw file):
""" if self.update:
You changed the semantics of this function. Now if a task with that name and contest is already in the DB and update is false no exception will be raised.
Please fix this.
cmscontrib/RemoveTask.py, line 50 at r2 (raw file):
def remove_task(task_name, task_id):
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 8 of 9 files reviewed, 18 unresolved discussions (waiting on @lerks and @wil93)
a discussion (no related file):
Previously, lerks (Luca Wehrstedt) wrote…
The unit tests for ImportContest, ImportTask, ImportDataset and AddStatement are failing, please fix them.
Done (I think, let's see what travis says)
a discussion (no related file):
Previously, lerks (Luca Wehrstedt) wrote…
In some places you switched some functions from doing their own logging of errors and returning
True
/False
to them raising an exception with the error message. I prefer the latter too and think it's good to do it, but if so please do it everywhere (or nowhere) consistently and do it in a separate commit for clarity.
I usually just update those that I encounter while changing other stuff, maybe a specific commit can be done for the remaining ones, after the PR...
a discussion (no related file):
Previously, lerks (Luca Wehrstedt) wrote…
In a couple of cases you changed the behavior of the tools. That is probably what is causing some of the test failures. Please don't do that. If all tasks in the database just happen to all have distinct names, the tools should keep working exactly like they did before. If that is not the case, the tools should require the least possible information necessary to disambiguate.
Most of the failures were because I forgot to add =None
to the new task_id parameter. I think all behaviors (when task names are different) should be the same, unless I did some mistake.
cmscontrib/AddStatement.py, line 47 at r2 (raw file):
Previously, lerks (Luca Wehrstedt) wrote…
Add
contest_name
.Provide a
None
default for bothcontest_name
andtask_id
(which means moving them to the end of the signature).
Added default None for task_id. For contest name I'm not sure why we would need that...
cmscontrib/AddStatement.py, line 103 at r2 (raw file):
Previously, lerks (Luca Wehrstedt) wrote…
Also add a
--contest-name
,-c
option.
Do we really need it?
cmscontrib/AddStatement.py, line 120 at r2 (raw file):
Previously, lerks (Luca Wehrstedt) wrote…
Does
add_statement
ever returnFalse
now? Isn't the return value in fact meaningless? This could be changed to justreturn 0
.
Done.
cmscontrib/AddSubmission.py, line 166 at r2 (raw file):
Previously, lerks (Luca Wehrstedt) wrote…
If you feel like adding
--contest-name
and--task-id
here too it wouldn't hurt I guess.
Mmm but if we already have contest_id then why specify also contest_name? Also task_id should never be necessary here, since contest_id and task_name are mandatory (and they uniquely identify which task we are submitting to)
cmscontrib/AddTestcases.py, line 50 at r2 (raw file):
Previously, lerks (Luca Wehrstedt) wrote…
Same here: allow
contest_name
.
I had removed it because it didn't look necessary. Now I added it back, I guess it could stay for backwards compatibility.
cmscontrib/ImportContest.py, line 205 at r2 (raw file):
Previously, lerks (Luca Wehrstedt) wrote…
You changed the behavior of the function. In this case you import the task without any checks and later you unconditionally bind it to a contest. An exception (of the wrong type) could be raised.
Which kind of check should we do? If -i
is used, then the task can always be imported (since we don't have the name constraint anymore). Now I tried to reflect this in the testsuite.
cmscontrib/ImportDataset.py, line 56 at r5 (raw file):
Previously, lerks (Luca Wehrstedt) wrote…
Same here (
contest_name
).
Is it necessary?
cmscontrib/importing.py, line 68 at r2 (raw file):
Previously, lerks (Luca Wehrstedt) wrote…
Add
contest_name
too, betweentask_name
andtask_id
.Also make
contest_name
andtask_id
by defaultNone
.
Made task_id None by default.
cmscontrib/importing.py, line 72 at r2 (raw file):
Previously, lerks (Luca Wehrstedt) wrote…
s/string/str/ (not your change, but as we're moving to py3 we're slowly fixing them wherever we find them)
Done.
cmscontrib/importing.py, line 77 at r2 (raw file):
Previously, lerks (Luca Wehrstedt) wrote…
?
Done. I initially implemented the "disambiguation screen" as a separate exception (which extended ImportDataError) but then I refactored it as just a custom message of ImportDataError.
cmscontrib/importing.py, line 84 at r2 (raw file):
Previously, lerks (Luca Wehrstedt) wrote…
Also filter by
contest_name
.
Is it necessary?
cmscontrib/importing.py, line 88 at r2 (raw file):
Previously, lerks (Luca Wehrstedt) wrote…
Use
.one()
instead (which should be more efficient) and detect the two cases of theif
below by catching the two exceptions it can raise. See
http://docs.sqlalchemy.org/en/latest/orm/query.html#sqlalchemy.orm.query.Query.one
Mmm... I just tried but the problem is, in the "MultipleResultsFound" case, I need to access the tasks list (in order to show the disambiguation screen). I suspect the only way to do that is with .all()
cmscontrib/ImportTask.py, line 69 at r2 (raw file):
Previously, lerks (Luca Wehrstedt) wrote…
Same here.
Made task_id None.
cmscontrib/ImportTask.py, line 142 at r2 (raw file):
Previously, lerks (Luca Wehrstedt) wrote…
You changed the semantics of this function. Now if a task with that name and contest is already in the DB and update is false no exception will be raised.
Please fix this.
I'm not completely sure of some cases: if I'm trying to update a task with cmsImportTask foo -c 123
, do I want to find the task with name foo
in contest 123
(which is unique, even though there could be many foo
s) and then update its content? Or do I want to find the task with name foo
(assuming it's unique, otherwise I'll specify the ID as well) and update it as in "update its content AND tying it to contest 123"?
What if the task with name foo
is unique and not tied to any contest yet? (a special case of the case described above). Should I tie it or just update the contests? Or maybe say "can't find task with that name in the contest"?
cmscontrib/RemoveTask.py, line 50 at r2 (raw file):
Previously, lerks (Luca Wehrstedt) wrote…
Same here.
Made task_id optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 10 files reviewed, 17 unresolved discussions (waiting on @lerks)
a discussion (no related file):
Previously, wil93 (William Di Luigi) wrote…
Done (I think, let's see what travis says)
Nope. AddStatement and ImportTask are still broken. (You can run those tests yourself, offline, before pushing, by launching cmstestsuite/RunUnitTests.py
)
a discussion (no related file):
Previously, wil93 (William Di Luigi) wrote…
I usually just update those that I encounter while changing other stuff, maybe a specific commit can be done for the remaining ones, after the PR...
I still want it to be done in a separate commit from all the rest of the work in this PR as otherwise it's just a mess to read.
cmscontrib/AddStatement.py, line 47 at r2 (raw file):
Previously, wil93 (William Di Luigi) wrote…
Added default None for task_id. For contest name I'm not sure why we would need that...
We don't "need" the contest name and, for that matter, we don't "need" the task name either (if we have the ID). I'm sure, however, that admins remember the names they gave to things better than the IDs that CMS gave to those same things, making it easier for them to use these commands. Also, this allows them to use the same commands (and, say, to put it in scripts) and have it work across reimports. I would actually ask you to remove the task ID if it weren't necessary to disambiguate between unbound tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 10 files reviewed, 17 unresolved discussions (waiting on @lerks and @wil93)
a discussion (no related file):
Previously, lerks (Luca Wehrstedt) wrote…
Nope. AddStatement and ImportTask are still broken. (You can run those tests yourself, offline, before pushing, by launching
cmstestsuite/RunUnitTests.py
)
Alternatively, you can setup travis to run on your fork and push a different branch than the one you're using for this PR to have it run the tests without affecting the PR.
Currently we have three separate uniqueness constraints: (name), (name, contest_id), and (num, contest_id). The first one is not really useful and in some cases it's really annoying (current workaround is adding the
contestname_
as a prefix of taskname).Fixes #765
Before:
After:
This change is