-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 9398535
Showing
720 changed files
with
107,757 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
target/ | ||
**/*.rs.bk | ||
Cargo.lock | ||
|
||
.tox/ | ||
build/ | ||
dist/ | ||
*.egg-info | ||
__pycache__/ |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
<!DOCTYPE html> | ||
<html> | ||
<head> | ||
<meta charset="utf-8" /> | ||
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" /> | ||
<meta http-equiv="refresh" content="0;URL=rascaline/index.html" /> | ||
</head> | ||
<body></body> | ||
</html> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
<!DOCTYPE html> | ||
<html> | ||
<head> | ||
<meta charset="utf-8" /> | ||
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" /> | ||
<meta http-equiv="refresh" content="0;URL=latest/index.html" /> | ||
</head> | ||
<body></body> | ||
</html> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Sphinx build info version 1 | ||
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. | ||
config: b1e09eb4e4ce65f5c8a7ee094b28f8f8 | ||
tags: 645f666f9bcd5a90fca523b33c5a78b7 |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+49.7 KB
latest/.doctrees/references/calculators/lode-spherical-expansion.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+63.7 KB
latest/.doctrees/references/calculators/spherical-expansion-by-pair.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
248 changes: 248 additions & 0 deletions
248
latest/_downloads/081634a43959105f7f41a7682e76d405/property-selection.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,248 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"\n# Property Selection\n\n.. start-body\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"import chemfiles\nimport numpy as np\nfrom metatensor import Labels, MetatensorError, TensorBlock, TensorMap\nfrom skmatter.feature_selection import FPS\n\nfrom rascaline import SoapPowerSpectrum" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"First we load the dataset with chemfiles\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"with chemfiles.Trajectory(\"dataset.xyz\") as trajectory:\n frames = [f for f in trajectory]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"and define the hyper parameters of the representation\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"HYPER_PARAMETERS = {\n \"cutoff\": 5.0,\n \"max_radial\": 6,\n \"max_angular\": 4,\n \"atomic_gaussian_width\": 0.3,\n \"center_atom_weight\": 1.0,\n \"radial_basis\": {\n \"Gto\": {},\n },\n \"cutoff_function\": {\n \"ShiftedCosine\": {\"width\": 0.5},\n },\n}\n\ncalculator = SoapPowerSpectrum(**HYPER_PARAMETERS)\n\ndescriptor = calculator.compute(frames)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The selections for feature can be a set of ``Labels``, in which case the names\nof the labels must be a subset of the names of the properties produced by the\ncalculator. You can see the default set of names with:\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"print(\"property names:\", descriptor.property_names)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We can use a subset of these names to define a selection. In this case, only\nproperties matching the labels in this selection will be used by rascaline\n(here, only properties with ``l = 0`` will be used)\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"selection = Labels(\n names=[\"l\"],\n values=np.array([[0]]),\n)\nselected_descriptor = calculator.compute(frames, selected_properties=selection)\n\nselected_descriptor = selected_descriptor.keys_to_samples(\"species_center\")\nselected_descriptor = selected_descriptor.keys_to_properties(\n [\"species_neighbor_1\", \"species_neighbor_2\"]\n)\n\nproperties = selected_descriptor.block().properties" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We expect to get `[0]` as the list of `l` properties\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"print(f\"we have the following angular components: {np.unique(properties['l'])}\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The previous selection method uses the same selection for all blocks. If you\ncan to use different selection for different blocks, you should use a\n``TensorMap`` to create your selection\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"selected_descriptor = calculator.compute(frames, selected_properties=selection)\ndescriptor_for_comparison = calculator.compute(\n frames, selected_properties=selected_descriptor\n)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The descriptor had 180 properties stored in the first block, the\nselected_descriptor had 36. So ``descriptor_for_comparison`` will also have 36\nproperties.\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"print(\"shape of first block initially:\", descriptor.block(0).values.shape)\nprint(\"shape of first block of reference:\", selected_descriptor.block(0).values.shape)\nprint(\n \"shape of first block after selection:\",\n descriptor_for_comparison.block(0).values.shape,\n)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The ``TensorMap`` format allows us to select different features within each\nblock, and then construct a general matrix of features. We can select the most\nsignificant features using FPS, which selects features based on the distance\nbetween them. The following code snippet selects the 10 most important\nfeatures in each block, then constructs a TensorMap containing this selection,\nand calculates the final matrix of features for it.\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"def fps_feature_selection(descriptor, n_to_select):\n \"\"\"\n Select ``n_to_select`` features block by block in the ``descriptor``, using\n Farthest Point Sampling to do the selection; and return a ``TensorMap`` with\n the right structure to be used as properties selection with rascaline calculators\n \"\"\"\n blocks = []\n for block in descriptor:\n # create a separate FPS selector for each block\n fps = FPS(n_to_select=n_to_select)\n mask = fps.fit(block.values).get_support()\n selected_properties = Labels(\n names=block.properties.names,\n values=block.properties.values[mask],\n )\n # The only important data here is the properties, so we create empty\n # sets of samples and components.\n blocks.append(\n TensorBlock(\n values=np.empty((1, len(selected_properties))),\n samples=Labels.single(),\n components=[],\n properties=selected_properties,\n )\n )\n\n return TensorMap(descriptor.keys, blocks)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We can then apply this function to subselect according to the data contained\nin a descriptor\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"selection = fps_feature_selection(descriptor, n_to_select=10)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"and use the selection with rascaline, potentially running the calculation on a\ndifferent set of systems\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"selected_descriptor = calculator.compute(frames, selected_properties=selection)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Note that in this case it is no longer possible to have a single feature\nmatrix, because each block will have its own properties.\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"try:\n selected_descriptor.keys_to_samples(\"species_center\")\nexcept MetatensorError as err:\n print(err)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
".. end-body\n\n" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.7" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 0 | ||
} |
Oops, something went wrong.