Skip to content

Commit

Permalink
Add parser and update readers for new options.yaml (#30)
Browse files Browse the repository at this point in the history
  • Loading branch information
PicoCentauri authored Jan 19, 2024
1 parent 51df872 commit 08844c5
Show file tree
Hide file tree
Showing 40 changed files with 1,569 additions and 254 deletions.
1 change: 1 addition & 0 deletions docs/src/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@
"python": ("https://docs.python.org/3", None),
"torch": ("https://pytorch.org/docs/stable/", None),
"metatensor": ("https://lab-cosmo.github.io/metatensor/latest/", None),
"omegaconf": ("https://omegaconf.readthedocs.io/en/latest/", None),
"rascaline": ("https://luthaf.fr/rascaline/latest/", None),
}

Expand Down
7 changes: 7 additions & 0 deletions docs/src/dev-docs/cli/eval_model.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
eval_model
##########

.. automodule:: metatensor.models.cli.eval_model
:members:
:undoc-members:
:show-inheritance:
7 changes: 7 additions & 0 deletions docs/src/dev-docs/cli/export_model.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
export_model
############

.. automodule:: metatensor.models.cli.export_model
:members:
:undoc-members:
:show-inheritance:
7 changes: 7 additions & 0 deletions docs/src/dev-docs/cli/formatter.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
formatter
#########

.. automodule:: metatensor.models.cli.formatter
:members:
:undoc-members:
:show-inheritance:
20 changes: 20 additions & 0 deletions docs/src/dev-docs/cli/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
CLI API
=======

This is the API for the command line interface ``cli`` functions of
``metatensor-models``.

.. toctree::
:maxdepth: 1

train_model
eval_model
export_model

We provide a custom formatter class for the formatting the help message of the
`argparse` package.

.. toctree::
:maxdepth: 1

formatter
7 changes: 7 additions & 0 deletions docs/src/dev-docs/cli/train_model.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
train_model
###########

.. automodule:: metatensor.models.cli.train_model
:members:
:undoc-members:
:show-inheritance:
1 change: 1 addition & 0 deletions docs/src/dev-docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,5 @@ module.
:maxdepth: 1

adding-models
cli/index
utils/index
1 change: 1 addition & 0 deletions docs/src/dev-docs/utils/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ This is the API for the ``utils`` module of ``metatensor-models``.
readers/index
writers
model-io
omegaconf
10 changes: 10 additions & 0 deletions docs/src/dev-docs/utils/omegaconf.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Custom omegaconf functions
==========================

Resolvers to handle special fields in our configs as well as the expansion/completion of
the dataset section.

.. automodule:: metatensor.models.utils.omegaconf
:members:
:undoc-members:
:show-inheritance:
23 changes: 19 additions & 4 deletions docs/src/dev-docs/utils/readers/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,29 @@ Structure and Target data Readers
The main entry point for reading structure and target information are the two reader
functions

.. automodule:: metatensor.models.utils.data.readers
:members:
.. autofunction:: metatensor.models.utils.data.read_structures
.. autofunction:: metatensor.models.utils.data.read_targets

Based on the provided filename they chose which child reader they use. For details on
Target type specific readers
----------------------------

:func:`metatensor.models.utils.data.read_targets` uses sub-functions to parse supported
target properties like the `energy` or `forces`. Currently we support reading the
following target properties via

.. autofunction:: metatensor.models.utils.data.read_energy
.. autofunction:: metatensor.models.utils.data.read_forces
.. autofunction:: metatensor.models.utils.data.read_virial
.. autofunction:: metatensor.models.utils.data.read_stress

File type specific readers
--------------------------

Based on the provided `file_format` they chose which sub-reader they use. For details on
these refer to their documentation

.. toctree::
:maxdepth: 1

structure
target
targets
2 changes: 1 addition & 1 deletion docs/src/dev-docs/utils/readers/structure.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@ which file type is stored in
Implemented Readers
-------------------

.. autofunction:: metatensor.models.utils.data.readers.structures.read_ase
.. autofunction:: metatensor.models.utils.data.readers.structures.read_structures_ase
13 changes: 0 additions & 13 deletions docs/src/dev-docs/utils/readers/target.rst

This file was deleted.

62 changes: 62 additions & 0 deletions docs/src/dev-docs/utils/readers/targets.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
Target data Readers
###################

Parsers for obtaining target informations from target files. All readers return a
:py:class:`metatensor.torch.TensorBlock`. Currently we support the following target
properties

- :ref:`energy`
- :ref:`forces`
- :ref:`stress`
- :ref:`virial`

The mapping which reader is used for which file type is stored in a dictionary.

.. _energy:

Energy
======

.. autodata:: metatensor.models.utils.data.readers.targets.ENERGY_READERS

Implemented Readers
-------------------

.. autofunction:: metatensor.models.utils.data.readers.targets.read_energy_ase


.. _forces:

Forces
======

.. autodata:: metatensor.models.utils.data.readers.targets.FORCES_READERS

Implemented Readers
-------------------

.. autofunction:: metatensor.models.utils.data.readers.targets.read_forces_ase

.. _stress:

Stress
======

.. autodata:: metatensor.models.utils.data.readers.targets.STRESS_READERS

Implemented Readers
-------------------

.. autofunction:: metatensor.models.utils.data.readers.targets.read_stress_ase

.. _virial:

Virial
======

.. autodata:: metatensor.models.utils.data.readers.targets.VIRIAL_READERS

Implemented Readers
-------------------

.. autofunction:: metatensor.models.utils.data.readers.targets.read_virial_ase
134 changes: 134 additions & 0 deletions docs/src/getting-started/custom_dataset_conf.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
.. _dataset_conf:

Customize a Dataset Configuration
=================================

Overview
--------
The main task in setting up a training procedure with `metatensor-models` is to provide
files for training, validation, and testing datasets. Our system allows flexibility in
parsing data for training. Mandatory sections in the `options.yaml` file include:

- ``training_set``
- ``test_set``
- ``validation_set``

Each section can follow a similar structure, with shorthand methods available to
simplify dataset definitions.

Minimal Configuration Example
-----------------------------
Below is the simplest form of these sections:

.. code-block:: yaml
training_set: "dataset.xyz"
test_set: 0.1
validation_set: 0.1
This configuration parses all information from ``dataset.xyz``, with 20% of the training
set randomly selected for testing and validation (10% each).

Expanded Configuration Format
-----------------------------
The train script automatically expands the ``training_set`` section into the following
format, which is also valid for initial input:

.. code-block:: yaml
training_set:
structures:
read_from: dataset.xyz
file_format: .xyz
unit: null
targets:
energy:
quantity: energy
read_from: dataset.xyz
file_format: .xyz
key: energy
unit: null
forces:
read_from: dataset.xyz
file_format: .xyz
key: forces
stress:
read_from: dataset.xyz
file_format: .xyz
key: stress
virial: false
test_set: 0.1
validation_set: 0.1
Understanding the YAML Block
----------------------------
The ``training_set`` is divided into sections ``structures`` and ``targets``:

Structures Section
^^^^^^^^^^^^^^^^^^
Describes the structure data like positions and cell information.

:param read_from: The file containing structure data.
:param file_format: The file format, guessed from the suffix if ``null`` or not
provided.
:param unit: The unit of lengths, optional but recommended for simulations.

A single string in this section automatically expands, using the string as the
``read_from`` parameter.

.. note::

``metatensor-models`` does not convert units during training or evaluation. Units are
only required if model should be used to run MD simulations.

Targets Section
^^^^^^^^^^^^^^^
Allows defining multiple target sections, each with a unique name.

- Commonly, a section named ``energy`` should be defined, which is essential for MD
simulations. For this section gradients like `forces` and `stress` are enabled by
default. See :ref:`energy-section` for further details on this section.
- For other target sections, all gradients are disabled by default.

Target section parameters include:

:param quantity: The target's quantity (e.g., energy, dipole).
:param read_from: The file for target data, defaults to the ``structures.read_from``
file if not provided.
:param file_format: The file format, guessed from the suffix if not provided.
:param key: The key for reading from the file, defaulting to the target section's name
if not provided.
:param unit: The unit of the target.
:param forces: Gradient sections. See :ref:`gradient-section` for parameters.
:param stress: Gradient sections. See :ref:`gradient-section` for parameters.
:param virial: Gradient sections. See :ref:`gradient-section` for parameters.

A single string in a target section automatically expands, using the string as the
``read_from`` parameter.

.. _gradient-section:

Gradient Section
^^^^^^^^^^^^^^^^
Each gradient section (like ``forces`` or ``stress``) has similar parameters:

:param read_from: The file for gradient data.
:param file_format: The file format, guessed from the suffix if not provided.
:param key: The key for reading from the file.

Sections set to ``true`` or ``on`` automatically expand with default parameters.

.. _energy-section:

Energy Section
^^^^^^^^^^^^^^
The ``energy`` section is mandatory for MD simulations, with forces and stresses enabled
by default.

- A warning is raised if requisite data is missing, but training proceeds without them.
- Setting a ``virial`` section automatically disables the ``stress`` section in the
``energy`` target.

.. note::

Unknown keys are ignored and not deleted in all sections during dataset parsing.
1 change: 1 addition & 0 deletions docs/src/getting-started/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ This sections describes how to install the package, and its most basic commands.

installation
usage
custom_dataset_conf
8 changes: 6 additions & 2 deletions docs/src/getting-started/usage.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Usage
=====
Basic Usage
===========

`metatensor-models` is designed for an direct usage from the the command line (cli). The
general help of `metatensor-models` can be accessed using
Expand Down Expand Up @@ -66,3 +66,7 @@ The sub-command to evaluate a already trained model is
.. literalinclude:: ../../../examples/usage.sh
:language: bash
:lines: 9-


In the next tutorials we show how adjust the dataset section of ``options.yaml`` file
to use it for your own datasets.
19 changes: 14 additions & 5 deletions docs/static/options.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,17 @@
defaults:
- architecture: soap_bpnn # architecture used to train the model
- _self_

# Section defining the parameters for structure and target data
dataset:
structure_path: "qm9_reduced_100.xyz" # file where the positions are stored
targets_path: "qm9_reduced_100.xyz" # file with target values (i.e energies)
target_value: "U0" # name of the target value in `targets_path`
# Last position of the _self_ this entry defines that default options will be
# overwritten by this config.

# Mandatory section defining the parameters for structure and target data of the trainin
# set
training_set:
structures: "qm9_reduced_100.xyz" # file where the positions are stored
targets:
energy:
key: "U0" # name of the target value

test_set: 0.1 # 10 % of the training_set are randomly split and taken for test set
validation_set: 0.1 # 10 % of the training_set are randomly split and for validation
2 changes: 1 addition & 1 deletion examples/options.yaml
3 changes: 0 additions & 3 deletions src/metatensor/models/cli/conf/config.yaml

This file was deleted.

Loading

0 comments on commit 08844c5

Please sign in to comment.