-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High level organization: why aren't we targeting a contest environment? #63
Comments
Just for sake of completeness lets remind ourselves of the reasons for two versions of the Hack Pack:
I believe that the bulk of these problems are symptoms of improper implementation instead of bad design. For example:
Now comes the question of how does this happen:
Here are some of the reasons why we require sample IO:
The major reason why we require code to be part of a sample solution is because sometimes it is difficult to tell where code like this may be used. Consider flow problems, they are used to solve maximal matchings. Without extensive experience with graphs, I doubt that you would guess this, and does not meet the Hack Pack++'s goal of being a learning resource for inexperienced team members. For people with experience who just want the algorithm, they should use the Hack Pack, not Hack Pack++. I believe that dropping the Sample problem requirement would be a mistake. |
Beside being handy for seeing obscure solutions to problems, what else are the contest problems doing for us? Sure, max flow problems may be one of the more elusive ones to pick out, but is that true for most contest problems? Or are we needlessly exploding the size of the Hack Pack? |
I'm concerned that our goals are focused in entirely the wrong place. Should we really give equal importance to "learning reference" and "contest aid"? That is, how many times has anyone actually looked to the Hack Pack for learning so far? I suspect the answer is close to zero, yet we still seem to over-prioritize the Hackpack++ aspect. It's even holding us back by requiring far too much of each submission: saying "you must contribute a lot of good stuff" earns us "a lot of probably bad stuff" instead of the more optimal "just enough great stuff" (see almost every external submission so far for examples). Condensation is a maybe-step-two afterthought, as is obvious from the proposed LaTeX guidelines. In the case of Dijkstra, for example, we can't put all the blame on our misguided contributor. Because of our guidelines for submission, he ended up writing a ton of useless cruft instead of what Marshall and I were looking for during the ICPC: how to do Dijkstra. That's a big problem on our side. Shouldn't we be going for "make something that's extremely useful in a contest environment" first, with additional reference as an extra? |
Perhaps we should provide a structure to allow submitting separate pieces? Dr. Dean likes for his 481 students to author problems, correct? Perhaps they could elect to author the description and reference code or a problem for each section in the Hack Pack? This would make it a lot easier to get content into the Hack Pack. But it could also leave us with sections that are of inconsistent quality if submitted content isn't more carefully vetted. As far as I can see it, this is a problem with focus and priority. I like the aspect of the Hack Pack++. References are really cool and useful, but, like @ProtractorNinja wrote, it keeps the Hack Pack from gaining content in a swift manner because it requires so much bulk (the biggest of that bulk being the sample/contest problem portion). We should prioritize the description and reference code content of a section first. These would be the more helpful things to use in a contest environment. Then we could look into adding problems... |
I stand by the complaints I have already raised regarding the Hack Pack's current direction, and I can't fully approve of project-defining documents until those issues are resolved. Anything to add, @mdclyburn and @robertu94? |
I am content with prioritizing Hack Pack content over Hack Pack++, but I do not believe that sample problems with input and output are unnecessary for the reasons that I listed above. Yes, many problem types are more obvious than max flows. That does not mean that an implementation cannot not contain nuance that can be revealed by input/output code as well as via sample input. I also would be fine with allowing piecemeal submission. I just hadn't broken down the issues further. |
I stand behind the idea of prioritizing Hack Pack content before Hack Pack++ is even considered. It might even be a more natural fit to the process. I also want to bring up the fact that we may not see much of anything come out of Dr. Dean's seminar this semester since it is not a variable credit course any longer (as of we had a high volume of high quality content before). |
Sample problems, sample data, full code, and unit tests are surely helpful in complex cases, but requiring them for every submission makes submission harder and bloats the hack pack. Submission should be easy. Large code projects need tests and proof because the code will live on effectively forever, enduring changes and supporting the rest of the code base. Code like that, in a living environment, needs to continually prove itself. For our purposes, submissions only need to be "proven" once, during approval, because our delivery is a static document: unit tests and needlessly full-featured code samples are overkill unless context demands them. Because we have Github, all we need to solve our this-reference-code-doesn't-work problem is intelligent reviewers who actually investigate the code. "Give me a larger sample that shows this code works" is a fitting demand for submissions that need it, but currently, if I want to submit a cool Python function I found helpful at the last Mercer competition, my excitement to contribute will end up dead in the water before I even finish reading the README. Once created and deployed, the average full I/O sample problem clogs the pack more than it helps. When I'm combing the hack pack during a competition, I'm only interested in the basics: a description of each algorithm or structure and a short, non-complex sample implementation with, if needed, an annotation about what kind of data it operates on or other noteworthy details. If needed. I don't want to see the standard includes unless I need them, I don't need to see The ideal algorithm or data structure sample doesn't need any of this extra stuff because the ideal snippet is short, sweet, and parsable; inevitable exceptions notwithstanding. We should be encouraging high-quality content like that, not average-quality content that simply ticks all the boxes. Heavy-handed checklist-style submission approval tactics will distract us (and future contributors/organizers) from what's important, like making sure that people actually find the hack pack useful. The hack pack should be useful. It really isn't. I'm appalled at how useless the hack pack was for me during the ICPC last year. It's an overly formal mess of nigh-useless content with a few flecks of gold hidden inside. Something is very wrong if anyone ever gets close to thinking that. The more I consider the idea, the less I care for even having the Hack Pack++ in addition to the standard Hack Pack. We're not doing one thing well; we're doing two or three things poorly. Perhaps it would be more prudent to focus on creating a streamlined, damn useful, practical document and expand it later only if interest is raised? I think that our current helpless state demonstrates that we bit off more than we could chew. We could really benefit by reducing the scope of this project and re-evaluating what the team really needs. We don't need a formal process for anything until then. |
Even if we do not include the full implementations of the solutions in the Hack Pack itself, I am not convinced that we can ensure functional code without them. These "ah! the reviewers will save us" methods fail us all the time because people are failing creatures. If you can prove exhaustively that code works without test cases and without actually compiling and testing it, then I would be happy to drop the requirement. However, time and time again we see not that careful review is what ultimately generates working code, but through testing. The last thing that we want to do in a contest is not reading a few extra lines of context and usage but to have to debug code from our cheatsheet that flat out doesn't work. As for your python snippets, if they are sufficiently short, they probably don't belong in the algorithms sections or the data structures sections where this kind of testing is essential. Rather, they should exist in the examples section like where Makefiles are stored. I also don't mean to discount your experience that you didn't find the document extremely useful, but if you are using the HackPack++ during a contest, you are going to be miserable. Especially, if you are a experienced veteran like you are. There is a ton of stuff that you don't need. Beyond using needing HackPack instead of HackPack++, when you had your problems: a majority of the document had not be marked to be minimized, and almost none of the sections are/were indexed. That is the tragedy of a work in progress, not a flawed design. I am happy to put a greater focus on writing the HackPack than the HackPack++, but not at the expense of through testing. |
I was using the regular hack pack. It's bad. A solid design would not have enabled its creation. I disagree with, well, just about everything you've said. I don't think you've effectively discounted any of the points I've already raised: again, submission approval is a process that only needs to be done once. Submitted code should, generally, be happily generic and widely applicable. It shouldn't be the sort of thing that needs unit tests to prove correctness. Eliminating Dr. Dean's credit boost means that we're only going to be getting submissions from people who are interested in making the hack pack better. We don't need unit testing to tone down any kind of content influx. We need good code that works. Unit testing is one way to get it, but requiring it now is overkill for the reasons I've already stated. Critical code should be elegant enough to innately demonstrate its correctness. People are failing creatures, yes, which is why it's nice that multiple people can review something more than once. They might mess up, but we don't need to plan for that quite yet: we haven't yet seen that reviewers are going to have trouble if they spend the time to really get involved with the code. Submissions that were already bad aren't going to get much better if they get smushed into unit tests; ironing out little bits of incorrectness won't help if the thing just sucks overall. I think that this sweet short testing-not-needed code ideal is the biggest difference in our thought processes. Algorithm and data structure base definitions and samples should't need to be unit tested, but we could say that full, input-driven samples that exist for certain complex processes do. |
Well. I didn't have any strong opinion on the policy of unit testing, but @ProtractorNinja brings up a good point. Do we even need unit testing? Unit testing is cool, I'll admit. I use it on all of the code that I care about. But its primary use is to ensure that a growing/shrinking/morphing code base still works in the way the author(s) intend it to. Can part of the review process for submission be the reviewer testing the code? If we can do this, we can cut down on a lot of bloat. And if one person isn't enough to review a submission, add another reviewer. I'm sure we can easily follow the rule of not merging in content if it is erroneous. Make a case for unit testing. Once a submission makes it into the Hack Pack, it should be correct. Unless there is a good reason for unit testing, I want it thrown out. |
This is likely up for debate in issue clemsonacm#63.
If I haven't convinced you both yet -- that a set of complete tests is necessary ensuring for functional code -- I doubt that I will. Go ahead and remove the requirement and make it a suggestion if that is what you both think is right. I will continue writing tests for the code that I write, and that will be then end of this thread. |
We still have much to consider, so this issue hasn't died quite yet. Tomorrow (Friday the 29th) I'll write more on what route I think we should take. |
I've made many critical remarks regarding the Hack Pack's current construction. My goal in this comment is to suggest a number of ideas and major refactoring steps that will dramatically improve and clarify the project. First, here's an outline of how we should go about doing this:
This comment addresses mostly (1), but I feel that I should note why I think (2) is necessary. Because I think there's only a small amount of kickass content in the current hack pack, it would be more effective to redesign the Hack Pack from the ground up instead of trimming it down from what we have now. Doing so in haste would be just as catastrophic as our current situation (because we've been working towards deadlines for much of the time), so let's be thorough on this. On to what I think we need. I'll provide use cases from two perspectives: users and submitters. First, our ideal user:
Key points of this case:
And our ideal contributor:
Key points:
Our Hack Pack should be organized to optimize these use cases, with the contest attendee as the overriding priority. Here are some ideas I think we should start with:
My ideal Hack Pack section looks something like this:
It's short, useful, and only includes exactly what a reader would want. Here also is a mockup of an idea for overall layout:
Thoughts? |
This plan sounds very radical, but I think it will actually solve a lot of our issues with the current system we have. Perhaps my comment about unit testing was a bit short on reasoning and incendiary in nature. I would like to reiterate that the idea of 'unit testing' submissions is useless once the content is in the Hack Pack. We want others to contribute. Even if their implementation is not entirely correct, it's the job of the maintainers and reviewers to help others' content mature enough to make for a valid submission. As I said before, the purpose of unit tests are to ensure that regressions don't happen when changes are introduced. Each submission to the Hack Pack should be completely modular and isolated from everything else. Even if I added an erroneous implementation of a splay tree, the code for Dijkstra should not suddenly break. Making contributors write unit tests for their own code is a bit silly as well. It is the job of the maintainer to properly vet submissions. Of course, ideally, we would like everyone to be reasonably confident in the code they send our way, but we shouldn't have to go as far as to wanting to unit test all the things. So, with your idea, we would be looking at a much more intensive review of the code and content that is submitted for inclusion in the Hack Pack. This is pretty much a given and should largely replace the unit testing methods we have in place currently. Maintainers and contributors should work closely together to ensure that submissions are valid before they make it into the document. I would be a little sad to see the didactic aim of the Hack Pack go out of the window (whoever contributed that KMP section sure put in a lot of work...), but the decision does seem to come out of common sense. Were we mistaken to even try to target everybody with a single document? I think so. Throwing out the heavier Hack Pack++ will also make our build system better, yes? I still like the idea of the 'Core' section being towards the end of the Hack Pack. I realize that this section is probably not going to be all that large, but it also isn't hugely important (not after the first five minutes of the contest, anyway).
Please tell me that is not a path... please... This seems like a very viable way to handle compiling the Hack Pack for the future. I don't have any specific concerns, but organization is key. |
Luckily, no, that's not a filepath--just a path through the tree that matches my Table of Contents. |
Our build system does work at the moment, but removing the Hack Pack++ would really slim things down. The process doesn't have any major external dependencies that we'd be eliminating. |
I agree that the plan sounds very radical. I don't quite see why we need to dump and reload all of the content. I also think that keeping history of the document is important even just for remembering why we wanted to go another way. Between I am fine with the new organizational structure (chapter organization) you propose. I don't see that much different from the current layout except renaming appendix to be core and moving it to the front. Perhaps you could explain what I am missing? I do agree that our index needs to be more useful. I would like to see some more concrete guidelines for indexing in your proposal. I agree that the user should be competent, but I do think we need more guidelines here. For example, there is a big difference between say Brendan competent and generic programmer competent. I would like to see some concrete details spelled out. I would try to favor robust examples sets over the strictly most concise set where this makes since. For example, consider some of the graph algorithms. Some of the variants can get fairly gnarly if you have never written them before. So for example we may have a general DFS, and the DFS variant that topologically sorts. You say, "Unit tests are exchanged for extensive review", but also "tests are not required." could you please clarify? In your example section if we are dropping IO code, could I convince you to include a strong suggestion for docstring-like comments for each function as a code style feature. Also if we are dropping IO code, I would like the code to be written as a part of a valid function. We wouldn't necessarily need a main/IO, but at least a function with arguments would be helpful. Additionally for languages like python with out strict typing, a list of expected types should be specified either by name or by docstring. I love Python, but I would emphasizing only use it where it makes sense. For many problems, the algorithms turn out being smaller in C++ because of STL's more robust data structures section. Additionally, most of our teams know C/C++ as their primary language. It may however be useful for string parsing or big integer problems. As such I recommend at least making a suggestion to focus on C++ over Python. I too will miss the Hackpack's didactic purpose, but I will follow the consensus. If we are removing HackPack++, I would also remove the find+awk magic that we use for pre-processor macros. It will simplify the build process, and removes unneeded tools, and will greatly improve readability of the make file. |
I put it at the front precisely because you'd want that stuff during the first five minutes (so it's conceptually 'first'). Having it at the back would mean that it stays out of the way, though; I guess it could go either way.
Starting from scratch (that is, keeping the LaTeX but reducing it to a content-less skeleton) would be much cleaner and wouldn't add that much more work (at least on the elimination scale I'm thinking of). Why would we need any more history than what is contained in GitHub's Issues?
I included that as a bit of an afterthought -- a starting point for more discussion. I think we'll need a new Issue for planning the new document's hierarchy. The main difference at the moment is that the Algorithms section is organized by topic.
I don't have anything concrete for that yet, aside from the observation that the current index isn't helpful. That's a topic for a new Issue, once we resolve high-level plans and problems.
I'm pretty sure that that example falls under what I meant, although I agree it wasn't 100% obvious from what I wrote. Nothing needs to be absolute; we just need to encourage an optimal balance between conciseness and clarity. I'm guessing you got the idea of "most concise" from my "minimum amount of code to implement...". I should have been more clear; I meant "minimum" to go along with "no need for unrelated supplements". "Absolutely as little code as possible" is not a good thing to strive for.
Better wording would have been 'unit testing is exchanged for extensive manual review'. Those two quotes mean the same thing.
I was looking through the Hack Pack again, and realized that most of the code that exists isn't as troublesome as I previously thought. Many of the computational geometry examples are already solid if a smidgen messy (my favorites are just I don't think we need a concrete specification for what exactly a code sample needs, except that it should be useful and modular. Sometimes this could be just a function; sometimes a full program; sometimes a few functions and a class. I think we should encourage understandable code (which may have comments) instead of trying to shoehorn in mandatory documentation. Many algorithms involve building their base structures first. That's an important thing to include somewhere, but one that I think contributes to clutter. One solution to this would be standardizing the structures that each algorithm may use and defining them (and how to fill them up / build them) in the Structures section, then working with an already-complete version in the Algorithms section. Come to think of it, it might even be sensible to combine the two sections into Data Structures and Algorithms. The Graph section, for instance, could begin with graph classes, then go on to specific algorithms that involve those classes. I'm not sure if merging them in this way is entirely advantageous, though; I'd like to hear everyone's opinions on organization once we get a new issue going for it.
I haven't mentioned anything about suddenly favoring Python over C++.
Yes, there's no reason to keep anything we don't need. |
After hearing your explanation, I think that we have the same idea.
Agreed
Agreed
Yes that is what I was getting at.
Thanks for the clarification. Just a philosophical point: suppose as the reviewer, the way that I review the
I guess that doc-strings aren't as important for C/C++, but for weakly/duck typed To be concrete that is what I had in mind using prefix sum as an example for C++: int prefix_sum(const vector<int> & v, const unsigned int index)
{
int sum=v[0];
for(int i = 1; i <= index && index < v.size() ; i++) tmp+=v[i];
return sum;
} Python: def prefix_sum(v,index):
"""
int[] v: list of numbers to sum
int index: a non-negative integer no greater than len(v)
"""
return sum(v[0:index+1])
Agreed. |
No.
That depends on the kind of code we want people to write--or what kind of code a particular situation calls for. On one hand, you could fix your python example by renaming 'v' to 'vector' or something similar. This particular function is simple enough that it doesn't really need any documentation, and it's the sort of function that's probably not going to be changed later by the programmer. Using 'vector' or another verbose variable means that it'll take a little longer to type, but not enough to outweigh the benefits of being intuitive. On the other hand, code that will probably need to be updated or changed or is generally confusing may warrant better naming, documentation notes, or other tricks on a per-case basis. I think I'd rather see code with longer names (i.e. 'self-documenting code') and less documentation, because it's somewhat less likely that someone, after copying code, will have to refer back to the hack pack later because they forgot or didn't record what their updated code's one-character variables mean. |
Why not? They would be useful to the developer in the cases that they don't pass. The also could provide insight as to what something was tested against. Could I get your thoughts here? |
Hmm... perhaps I was still reacting to the idea of enforcing unit tests everywhere. I think it would be fine to allow storing unit tests in the repository, as long as they're useful and we don't require all code to be submitted in a manner that is testable. |
I do not really have a problem with this, but we then end up with this weird inconsistency where some code has unit tests and some does not. I am not against saving unit tests, but why would we need them in the future if we only merge code that is known to work and should not change in the near future? |
@mdclyburn: That's something I was wondering about. I'm leaning towards saying "no" to tracking extraneous tests and stuff except in very special cases, as a way of keeping things clean. |
I don't see how tracking tests can really hurt us, and in the case that we find a problem with code that has been committed, we can know what a program has been tested against. I just don't see them as extraneous. Additionally, if your concern is consistency, we are going to have differing levels of detail and files related to a topic with your match the level of detail to the complexity of the problem. |
The only time we'd be revisiting code that we saved unit tests for would be when those unit tests failed to verify that the code works. Unit tests, if used for reviewing, can just as well be inserted into a code block in the pull request for the code in question. Tests should stay out of the repo. |
@ProtractorNinja Do you even hear what you are saying!? You are suggesting tracking code via the GitHub pull request comment system. Do you realize how ridiculous that is? Emailing the unit-tests back and forth would be better than that because at least that is distributed. You also essentially lock us in to GitHub permanently as our source code repo provider because our project history will be in a format that we cannot easily extract. Which situation would you rather be in, a case where you have some idea of what has been tried before or one where you have none? If you both really don't want to commit tests, then I won't do it, but for the love of sanity, don't use GitHub comments as your means of distributing any type of code. |
My problem with the whole unit testing system we used before is that we tested the sample problems that had the algorithm/data structure implemented. This did not apply to any sort of reference code that we could have written (what reference code did we have that was unit tested?)... I actually disliked the whole idea of extracting a solution from a sample problem's solution. I understand wanting to reuse code for testing things in the future should it be necessary, but unit testing is too powerful of a solution. It adds a lot of bulk to submissions. We're trying to make it easier to add content, and I think being a responsible reviewer should necessitate testing a sufficient, effective set of inputs. It's a difficult topic for me to debate. I see the merits of both ideas, but I think our small group size makes unit testing a hefty, unnecessary solution. |
We have made some headway on this issue, but I am very disappointed by the overall lack of progress we have made on this issue. Continuing like this is a futile matter. I am closing this issue to further discussion. I'm going to take some time and deliberate and will get back to you both with a plan of action to resolve the future of the Hack Pack. |
Splitting off from #60:
Here's a relevant problem: I poked around to test the claim that the Hack Pack didn't have any code for Dijkstra's Algorithm in it. The only sample code for Dijkstra's Algorithm is currently hiding in the graph section, on line 84 of a sample contest problem solution, denoted only by
//dijkstra woohoo
. That's not in the index (surprise!) and the Dijkstra chapter -- which is cluttered up with three example contest problems that add literally nothing in a contest environment except for confusion and cows -- doesn't contain any code or reference any code. It doesn't even explain how the algorithm works.That's terrible.
The text was updated successfully, but these errors were encountered: