-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keeping QCSchema in sync with QCElemental #68
Comments
+1 option 1
|
As an external user, I would strongly push for choice 1. Promoting option 1, IMHO sends a strong signal that this schema is only appropriate for QCElemental, and it's no longer intended as a cross-program interchange. Option 3 sends the signal that the community input is driving the schema, but I can understand that's frustrating for QCelemental. So I would strongly push for choice 2. |
That's certainly not what I'd want signaled. Rather, I like option 1 for its single-source-of-truth property. Do you think calling qcel PRs suggesting changes to the data layout of |
I feel pretty strongly about this. This is an independent repo with a set of existing issues, PR, etc. for the schema. IMHO, QCElemental should be a client of this repo (i.e., it depends on the schema). The single source of truth should be here. Frankly, I'm unlikely to use QCElement because every project already has similar functionality (cclib, Open Babel, Avogadro, etc.). Until I saw this issue, I frankly didn't know it existed. This repo was created for the express purpose (and declared by its description) of being the home of the QC schema. Consider also that https://molssi-qc-schema.readthedocs.io/en/latest/auto_topology.html is not a subset of the QCElemental docs. Please don't make the schema subservient to another project. If you want to use QCElemental to work on development versions, that's fine. But the standard version should be declared here and IMHO, all issues, discussion relevant to the schema should be in this repo, not via some tag in a different repo. |
I think a clear distinction here is that developing in QCElemental with option 1/2 doesn't necessitate using QCElemental as we will be exporting JSON-Schema to a repo (likely here). QCElemental is used to compose the schema, provide a Python-based structure for those who want it, or generate JSON-Schema like we do here. In the above scenarios I would much prefer 2) over 1) to for the comments listed above. In general, we are finding QCElemental as a driving force for cross-program interchange because there is an implementation behind it that does something useful. A concrete schema implementation that has the bells and whistles needed forms a cornerstone of an ecosystem. We can now compute a dozen programs using QCSchema with QCEngine and I know of ~40M results computed with QCFractal using the Schema. There are now active learning programs, force field fitting toolkits, visualization tools, etc all built on top of the Schema and data repos like Fractal. For example it would be great if Avogadro picked up QCEngine (which uses CCLib) instead of writing their own quantum chemistry compute routines. You probably need a few more programs with orbitals built in, but thats easily done. Come talk to us. |
I think if we are to drive adoption then option 3 is the best, with 2 being a more pragmatic second best. I don't think we are going to drop directly using QM codes in Avogadro, but adding something like QCEngine as another option would work assuming it offers the basic capabilities we need. I see the most value in codes adopting QC JSON natively, but there is a lot of value in something like QCEngine doing the translation. As you know Chemical JSON was developed using something akin to option 1 from the start. I think it has utility, but it is something different to what we set out to achieve with QC JSON. It would be a shame to see the goal of forming a community schema recede, but we knew from the start it was tough to do. MolSSI is the organization with the funding and time to put in to efforts like this. |
@dgasmith - while I agree that you don't have to use QCElemental under option one, that choice implies that the 'ground truth' for the schema falls to that project, not to this repo. I'm repeatedly telling you that sends the wrong message to the community. If you want to get more community feedback, I'm happy to ping others from the meeting to come here and comment. Doing something useful with the schema is incredibly important, as is continual development. I have a pretty big pool of calculations myself (~20M) and Harvard Clean Energy is.. 150M? That shouldn't factor into where and how the schema is standardized. IMHO option 2 sounds like something you'll accept, and I think that's the pragmatic way forward. |
My primary goals are to:
So option 2 is fine by me. |
I completely agree, but this is a very slow process as noted. Just to echo: QCEngine exists to help move the process along and provide demonstrable use cases.
I get that, but I am confused of why this is the case. A move to Elemental seems more like "MolSSI is building tools for a community built schema, let us build out a Python ecosystem for communication that can also be used in any language." Which seems like a reasonable message otherwise we are saying "here is a schema that has no uses or community".
Definitely, which is why the number was a pretty minor part of a paragraph. I was attempting to paint a picture that schema is beginning to be accepted by some of the community by having useful libraries that support it.
I echo this quite strongly, Elemental has a strong user base that we just haven't been able to build here. If we can kick this repo along let us do it, but we so far haven't been successful at that. I should note here that we have largely been taking strategy 3 so far where we add first to this repo and then to QCElemental. I think the only place where this isn't the case and we take strategy 2 is a geometry optimization input/output schema where we currently acknowledge that this repo is the single source of truth and we may have to change. |
Below is a proposal that tries to strike a balance between @loriab's points about correctness and de-duplication, and @ghutchis's points about community engagement, a single source of truth, and the QCSchema existing separate from QCElemental. It sort of falls between options 2 and 3. Community input1: The MolSSI/QCSchema repository MUST continue to exist. Releases2: MolSSI/QCSchema MUST have versioned releases. QCElemental models3: Models in MolSSI/QCElemental MUST be marked as to whether they are part of QCSchema. Documentation3.1.2: The documentation of QCSchema in MolSSI/QCSchema MUST be auto-generated from the QCSchema models in MolSSI/QCElemental. Items 3.1 and 3.1.1 are intended to establish a single source of truth while keeping MolSSI/QCSchema and MolSSI/QCElemental up to date and synchronized. Ideally, MolSSI/QCSchema would be the single source of truth, but models are a superset of the schema, and the thing that actually gets tested. As a result, we are likely to end up with correct models and incorrect schema. |
So now we do line-item voting, Security Council veto rules? Just kidding. Matt's proposal sounds good to me. Only a couple items I want to make sure we're on the same page about:
|
+1 I think this seems like a pragmatic approach. |
@loriab - If you can add a linter or Github action to check if PRs affect the schema, that would be a great solution. |
Sounds like a reasonable and pragmatic approach to me too. I personally never thought this would be easy, but I am passionate about seeing it move forward and gain wider adoption as features are added. |
Apologies for accidentally closing this - mouse slippage... |
I am not ignoring this on purpose. I do want to weigh in on this soon, however I am currently traveling and teaching (and there is a lot here to absorb). I will try to come up with some coherent thoughts tonight. |
Thanks for this issue! Similar to @ghutchis I didn't got the connection between QCElemental and QCSchema in the first place. It would be really helpful to have the list from above, or whatever the final form will be, as a contributing guideline for this repository or in some visable place. I for one didn't consider contributing to QCSchema so far because I didn't really knew how. |
Yes to all of this. I have updated my earlier comment accordingly. We should look into making a bot that creates cross-linked issues. For example, if an issue is made on MolSSI/QCSchema titled "Foo", then a corresponding issue gets made on MolSSI/QCElemental titled "QCSchema issue #123: Foo". This issue's body would then just be a link to MolSSI/QCSchema#123. On the QCEl side, we would do the same thing, but only for issues that have the "Schema" tag. |
I wasn't going to bring it up until it was proof-of-principle, but since Matt mentioned it, I'm working on MolSSI/QCElemental#204. The notion is that there's a GHA that runs on every PR of qcel and generates the json schema files from a list of models (provenance, molecule, etc.). Then it runs a diff against qcsk (QCSchema) master and if there's a change, creates a branch (working up to this point) and PRs that to qcsk and returns red X to qcel. The PR here can be a place for discussion, any changes can be added to the qcel PR, and when auto PR is merged here, then qcel will show green check. This is my first foray into GHA, so I or GHA may still fail, but it's going easily so far. How does the plan sound? A side issue is that the QCSchema repo is presently a python module. This is for historical reasons of testing and documentation. Since the former is handled at qcel and the latter can be reworked, what does everyone think of the qcsk repo being JSON files instead, like https://github.com/loriab/QCSchema/tree/py2json/qcschema_json ? |
Adding here a link to a PR I created a while back that seeks to standardize the QCSchema models to a common interface. I'm in favor of implementing the models only once, and exporting that json schema from the Open to comments/suggestions. Thanks! |
QCSchema lags behind what's actually implemented AND DOCUMENTED in QCElemental. The QCSchema docs are where people look for info, so this creates a misleading impression about what QCSchema is/does in practice.
Some possible solutions:
.schema()
/autodoc on the myriad models in QCElementalMy favorite option is 1. I'm okay with 2. I think 3 is a bad idea.
@dgasmith @bennybp
The text was updated successfully, but these errors were encountered: