Add parser and update readers for new options.yaml #30

PicoCentauri · 2024-01-13T08:33:16Z

This is still work in progress, but should already contain the full functionality of the options parser and the corresponding readers.

TODO

docs
test

📚 Documentation preview 📚: https://metatensor-models--30.org.readthedocs.build/en/30/

frostedoyster · 2024-01-15T21:21:52Z

docs/static/options.yaml

+# Last position of the _self_ this entry defines that default options will be
+# overwritten by this config.


I don't quite understand. What is self for?

This is a hydra thing _self_ represents the current config. Imagine you have the following default

defaults: - architecture: soap_bpnn - _self_

Then the current config overwrites options in soap_bpnn. But, if you have the following

defaults: - _self_ - architecture: soap_bpnn

soap_bpnn will overwrite the current options. Of course this is not what we want but hydra has this possibility and will throw a waring if you do not define the _self_ entry in the default list. I will clarify this in the docs!

frostedoyster · 2024-01-15T21:30:39Z

src/metatensor/models/cli/train_model.py

+            subsets = torch.utils.data.random_split(
+                dataset=train_dataset,
+                lengths=[
+                    fraction_train_set,
+                    fraction_test_set,
+                    fraction_validation_set,
+                ],
+                generator=generator,
+            )
+
+        train_dataset = subsets[0]
+        if fraction_test_set and not fraction_validation_set:
+            test_dataset = subsets[1]
+        elif not fraction_validation_set and fraction_validation_set:
+            validation_dataset = subsets[1]
+        else:
+            test_dataset = subsets[1]
+            validation_dataset = subsets[2]


I'm too lazy to think about the code, what happens here if the user specifies a validation set but then sets a test set of 0.1?

then 90% of the original train set goes into training and 10% of the original training set into validation. Does this makes sense for you?

Yes, perfect

frostedoyster · 2024-01-15T21:37:00Z

src/metatensor/models/utils/data/readers/readers.py

+
+            if target["stress"] and target["virial"]:
+                raise ValueError(
+                    "Cannot add gradient displacement gradient for stress and virial!"


I think most people wouldn't understand this. How about "cannot use stress and virial at the same time" and/or "only one of the two is allowed at once"?

Agree. I will change it.

FYI: I think I'll add an explicit gradient w.r.t. "strain" (i.e. -stress) in rascaline, to resolve an ambiguity in the "cell" gradients. A similar naming could be used internally here! (no comment on the user facing error message)

frostedoyster · 2024-01-15T21:42:37Z

src/metatensor/models/utils/data/readers/targets/ase.py

+def _read_virial_stress_ase(
+    filename: str,
+    key: str,
+    is_virial: bool = True,


I don't know if this function should have a default for this argument

See your point. For energy we could set it to "energy", "forces" "virial" and "stress". But, for this private function I would leave it mandatory.

frostedoyster · 2024-01-15T21:48:22Z

src/metatensor/models/utils/data/readers/targets/ase.py

+        Labels(["direction_1"], torch.arange(3).reshape(-1, 1)),
+        Labels(["direction_2"], torch.arange(3).reshape(-1, 1)),


We still haven't agreed as a group on what a good naming convention is here... essentially the first one should index the three cell vectors, while the second one would index their individual xyz components. TBD

Okay let's keep this in mind. Can you open an issue once we merged this PR to keep this in mind.

frostedoyster · 2024-01-15T21:52:47Z

src/metatensor/models/utils/data/readers/targets/ase.py

+    if is_virial:
+        values *= -1
+    else:  # is stress
+        values *= torch.tensor([f.cell.tolist() for f in frames])


Here I think values should multiplied by the volume of the cell (ASE might have a function or property for that). I'm not sure what this operation is doing. It would also be great to raise a good error if there is no cell.

Ohh yes you are completely, right!

frostedoyster · 2024-01-15T21:56:33Z

tests/utils/data/targets/test_targets_ase.py

+    result = read_stress_ase(filename=filename, key="stress-3x3")
+
+    expected = torch.tensor(structures.info["stress-3x3"])
+    expected *= torch.tensor(structures.cell.tolist())


Also here, shouldn't this be the cell volume (a single scalar)?

PicoCentauri · 2024-01-16T22:01:35Z

Sorry for the long delay and thanks @frostedoyster for the review. Maybe @frostedoyster and @Luthaf you can add a look the tutorial if this clear and you understand how you can construct the dataset section. Also please check if this follows our convention that we have defined. Thanks in advance.

PicoCentauri force-pushed the train-set-parser branch 4 times, most recently from 901d2bc to 9a359c5 Compare January 15, 2024 16:51

Add parser and update readers for new options.yaml

e5ab8b1

PicoCentauri force-pushed the train-set-parser branch from 9a359c5 to e5ab8b1 Compare January 15, 2024 17:02

frostedoyster self-requested a review January 15, 2024 21:19

frostedoyster reviewed Jan 15, 2024

View reviewed changes

PicoCentauri added 3 commits January 16, 2024 12:27

Address reviewers comments

3893055

Add API docs and allow bool in grad options

d8d1e5c

Add docs

8469927

PicoCentauri marked this pull request as ready for review January 16, 2024 21:59

PicoCentauri force-pushed the train-set-parser branch 2 times, most recently from 4661be2 to deb9adf Compare January 18, 2024 12:43

code cleanup and proofread of docs

e83e840

PicoCentauri force-pushed the train-set-parser branch from deb9adf to e83e840 Compare January 18, 2024 13:32

frostedoyster merged commit 08844c5 into main Jan 19, 2024
7 checks passed

frostedoyster deleted the train-set-parser branch January 19, 2024 10:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parser and update readers for new options.yaml #30

Add parser and update readers for new options.yaml #30

PicoCentauri commented Jan 13, 2024 •

edited

Loading

frostedoyster Jan 15, 2024

PicoCentauri Jan 16, 2024

frostedoyster Jan 15, 2024

PicoCentauri Jan 16, 2024

frostedoyster Jan 19, 2024

frostedoyster Jan 15, 2024

PicoCentauri Jan 16, 2024

Luthaf Jan 16, 2024

frostedoyster Jan 15, 2024

PicoCentauri Jan 16, 2024

frostedoyster Jan 15, 2024

PicoCentauri Jan 16, 2024

frostedoyster Jan 15, 2024

PicoCentauri Jan 16, 2024

frostedoyster Jan 15, 2024

PicoCentauri Jan 16, 2024

PicoCentauri commented Jan 16, 2024

		# Last position of the _self_ this entry defines that default options will be
		# overwritten by this config.

		Labels(["direction_1"], torch.arange(3).reshape(-1, 1)),
		Labels(["direction_2"], torch.arange(3).reshape(-1, 1)),

Add parser and update readers for new options.yaml #30

Add parser and update readers for new options.yaml #30

Conversation

PicoCentauri commented Jan 13, 2024 • edited Loading

TODO

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PicoCentauri commented Jan 16, 2024

PicoCentauri commented Jan 13, 2024 •

edited

Loading