-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parser and update readers for new options.yaml #30
Conversation
901d2bc
to
9a359c5
Compare
9a359c5
to
e5ab8b1
Compare
# Last position of the _self_ this entry defines that default options will be | ||
# overwritten by this config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand. What is self for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a hydra thing _self_
represents the current config. Imagine you have the following default
defaults:
- architecture: soap_bpnn
- _self_
Then the current config overwrites options in soap_bpnn
. But, if you have the following
defaults:
- _self_
- architecture: soap_bpnn
soap_bpnn
will overwrite the current options. Of course this is not what we want but hydra has this possibility and will throw a waring if you do not define the _self_
entry in the default list. I will clarify this in the docs!
subsets = torch.utils.data.random_split( | ||
dataset=train_dataset, | ||
lengths=[ | ||
fraction_train_set, | ||
fraction_test_set, | ||
fraction_validation_set, | ||
], | ||
generator=generator, | ||
) | ||
|
||
train_dataset = subsets[0] | ||
if fraction_test_set and not fraction_validation_set: | ||
test_dataset = subsets[1] | ||
elif not fraction_validation_set and fraction_validation_set: | ||
validation_dataset = subsets[1] | ||
else: | ||
test_dataset = subsets[1] | ||
validation_dataset = subsets[2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm too lazy to think about the code, what happens here if the user specifies a validation set but then sets a test set of 0.1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then 90% of the original train set goes into training and 10% of the original training set into validation. Does this makes sense for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, perfect
|
||
if target["stress"] and target["virial"]: | ||
raise ValueError( | ||
"Cannot add gradient displacement gradient for stress and virial!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think most people wouldn't understand this. How about "cannot use stress and virial at the same time" and/or "only one of the two is allowed at once"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I will change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: I think I'll add an explicit gradient w.r.t. "strain"
(i.e. -stress
) in rascaline, to resolve an ambiguity in the "cell"
gradients. A similar naming could be used internally here! (no comment on the user facing error message)
def _read_virial_stress_ase( | ||
filename: str, | ||
key: str, | ||
is_virial: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this function should have a default for this argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See your point. For energy we could set it to "energy"
, "forces"
"virial"
and "stress"
. But, for this private function I would leave it mandatory.
Labels(["direction_1"], torch.arange(3).reshape(-1, 1)), | ||
Labels(["direction_2"], torch.arange(3).reshape(-1, 1)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still haven't agreed as a group on what a good naming convention is here... essentially the first one should index the three cell vectors, while the second one would index their individual xyz components. TBD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay let's keep this in mind. Can you open an issue once we merged this PR to keep this in mind.
if is_virial: | ||
values *= -1 | ||
else: # is stress | ||
values *= torch.tensor([f.cell.tolist() for f in frames]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I think values
should multiplied by the volume of the cell (ASE might have a function or property for that). I'm not sure what this operation is doing. It would also be great to raise a good error if there is no cell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh yes you are completely, right!
result = read_stress_ase(filename=filename, key="stress-3x3") | ||
|
||
expected = torch.tensor(structures.info["stress-3x3"]) | ||
expected *= torch.tensor(structures.cell.tolist()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here, shouldn't this be the cell volume (a single scalar)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep!
Sorry for the long delay and thanks @frostedoyster for the review. Maybe @frostedoyster and @Luthaf you can add a look the tutorial if this clear and you understand how you can construct the dataset section. Also please check if this follows our convention that we have defined. Thanks in advance. |
4661be2
to
deb9adf
Compare
deb9adf
to
e83e840
Compare
This is still work in progress, but should already contain the full functionality of the options parser and the corresponding readers.
TODO
📚 Documentation preview 📚: https://metatensor-models--30.org.readthedocs.build/en/30/