-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PopulationSim Input Validation #187
Comments
I would suggest that using a data model to first describe the inputs and then using that data model to validate the inputs is a better way forward than simply checking the inputs. Checking inputs strikes me as fairly low value in the absence of defining the inputs. And defining the inputs in software is far better than defining inputs in a document, like a Wiki. |
This sounds a little bit like semantics - data model vs input validation. I think the key here is that PopSim assumes a high degree of input consistency and it would be helpful to have some process to enforce that upfront prior to getting into cryptic code error messages. |
In my experience, PopulationSim is challenging for new users for at least the following reasons:
A data model can help by doing the following (I've highlighted in bold italic text the functions of the input checker):
A data model approach is therefore different both philosophically and practically than an input checking approach. The former attempts to describe the system; the latter describes only the business rules of the system. In my view, it's important to put the business rules in the context of the system description. As a user, I would certainly appreciate being told at the beginning that I've violated the software's business rules, but I would also be irritated that there's no description of how the system works. Because PopulationSim's design seeks to accommodate maximum flexibility, i.e., the seed and controls can contain any variables, the data model must be specific to the examples. New users could then read these examples to understand how the system works and make their own versions (though in most ABM use cases the version distributed with the code would suffice). The business rules could live in utility methods used by the data models or, given PopulationSim's modest runtime, done via better error handling (perhaps as a part of a broader effort to improve the software's communication back to the user). |
@DavidOry makes an undeniable well thought out argument/proposal (as is his norm). When I said this is semantics, I meant around the goal of validating inputs. I think David's thoughts expand the scope a little bit (beyond just validating inputs), but likely in the correct way. If the scope can allow for the expansion, I fully support what David is laying out - that a better way (more efficient over the life of the project) to achieve this is through a data model example. I support closing this issue and reopening a PopSim data model issue - with the foundation of approach and benefits captured in David's notes. |
Issue Background
Several usage-related questions have come up where inconsistent results are generated due to improper input data or configuration. This issue is particularly acute in multiprocessing cases where results vary when run in multiprocessing versus single thread. Often this is due to a tiny mistake in the configuration which can be especially difficult to trace when no error is raised, and bad results are produced. A relatively simple mistake occurs when a sub-geography overlaps with two super-geographies. This may not cause an error in single thread, even though the results are inaccurate. The problem can be addressed via an assert statement when the data is loading into PopulationSim. After parallel processing completes, tables produced from slicing are not automatically coalesced; the software must be told what tables to coalesce in the settings file, which is error prone. It may be possible to perform this step automatically in the software without requiring user-specified settings. Failing that, the software could at least check for consistency in terms of the tables that are coalesced.
Solution
There are a number of other data checks that users should be aware of but would not be possible to check automatically given the flexibility of the software. These include checks to make sure that household and person controls are consistent across geographies, that data does not include missing values, and that controls are consistent with the incidence table specified for the seed data. This is very much inline with the input checker developed for ActivitySim. Additionally, an FAQ on the PopulationSim wiki with a list of common data and control file checks for new users should be developed.
The text was updated successfully, but these errors were encountered: