updated doc, more tests, little refatoring in python interface

rikigigi · Jan 24, 2022 · 5a042d3 · 5a042d3
1 parent 14ec6df
commit 5a042d3
Show file tree

Hide file tree

Showing 75 changed files with 967 additions and 1,021 deletions.
diff --git a/README.md b/README.md
@@ -69,10 +69,16 @@ analisi_traj = pyanalisi.Trajectory(
       pos, #shape (ntimesteps, natoms, 3)
       vel, #same shape as pos
       types, #shape (natoms)
-      box, #shape (ntimesteps, 3,3) or (ntimesteps, 6)
-      True, # if True, box for a single frame is 3,3,
-            # if False is 6
-      False # don't apply pbc
+      box, #shape (ntimesteps, 3,3)
+           # or (ntimesteps, 6)
+           # or (ntimesteps, 9)
+      pyanalisi.BoxFormat.CellVectors,
+                   # input box format is
+                   # matrix of cell vectors
+      False, # don't wrap atoms around the center of 
+             # the cell
+      False # don't save rotation matrix that is used
+            # internally if the cell is triclinic
   )
 
 #do the calculation that you want
@@ -266,22 +272,34 @@ Where meaningful, the following arguments are shared by all calculation performe
 You can create a trajectory python object to be used in this library in two ways: 
  - start from python arrays generated by other code
  - use a LAMMPS binary file that you have on the filesystem. This is the same file that is used by the command line interface.
+
+to access the positions, velocities and cell data you can use in both the python buffer protocol interface and the lammps binary one the functions `get_positions_copy()`, `get_velocities_copy()` and `get_box_copy()`. Note that each call of these functions allocates more memory, as needed.
+
+### Internal format used to store the cell information
+
+Internally the format used for storing the cell information is:
+
+`[x_low, y_low, z_low, lx/2, ly/2, lz/2, xy, xz, yz]`
+
+and is accessible by using the python function `get_box_copy()` common to the trajectory objects. Note that internally the first cell vector is always in the same direction of the x axis, the second one is on the xy plane while the third has an arbitrary direction. This is equivalent to requiring that the cell matrix is triangular. If necessary, this is achieved by doing a QR decomposition of the input cell matrix and then rotating everything with the Q matrix.
 
 
-#### using Python arrays: the buffer protocol interface
+### using Python arrays: the buffer protocol interface
 
 
 You must have 4 arrays. In general the interface supports any object that supports python's buffer protocol. This include, among others, numpy arrays. Suppose you have a trajectory of `t` timesteps and `n` atoms. You need the following arrays:
  - position array, of shape `(t,n,3)`
  - velocity array, of shape `(t,n,3)`
- - cell array, of shape `(t,3,3)` only diagonal matrices (orthorombic cells) are supported at the moment, or if a lammps formated cell is provided `(t,6)` The lammps format simply list the low and high coordinates of the orthorombic cell, as in the header of each timestep that you find in the lammps trajetory format (shown in [command line interface](#command-line-interface) section )
+ - cell array, of shape `(t,3,3)` or if a lammps formated cell is provided `(t,6)` for orthorombic and `(t,9)` for triclinic. The lammps format simply list the low and high coordinates for each dimension and the tilts factors as described below
  - types array, of shape `(n)` (integer array)
 
 In general no particular units of measure are required, and the output will reflect the units that you used in the input. The calculations that the program does are reported exactly in the following sections. From those formulas you can deduce the units of the output given the units that you used in the input.
 
-Then you must decide if you want that the coordinates are rewrapped inside the cell or not. At the moment only orthorombic cells are supported in all calculations but those that need only unwrapped coordinates, like MSD.
+Then you must decide if you want that the coordinates are rewrapped around the center of the cell or not. This is not equivalent to wrapping the coordinates inside the cell for the triclinic case, but generates a compact list of positions suitable for an efficient calculation of the minimum image distance.
+
+The lammps format for the cell is `[x_lo, x_hi, y_lo, y_hi, z_lo, z_hi, xy, xz, yz]`, as described in the [lammps documentation](https://docs.lammps.org/Howto_triclinic.html).
 
-The lammps format for the cell is `[x_lo, x_hi, y_lo, y_hi, z_lo, z_hi]`: you have to provide only the coordinates boundaries.
+If the plain cell vectors are provided in the cell matrix and this matrix is not diagonal, a QR decomposition is performed to get a triangular cell matrix and the achieve the desidered internal format. In this case all velocities and positions vector are rotated with the rotation matrix Q, that can be obtained with the function `get_rotation_matrix()`.
 
 The syntax for creating the object is 
 ```
@@ -290,12 +308,14 @@ analisi_traj = pyanalisi.Trajectory(positions_array,
                                     velocity_array,
                                     types_array,
                                     box_array,
-                                    use_matrix_or_lammps_cell_format,
-                                    wrap_atomic_coordinates)
+                                    input_box_format_id,
+                                    wrap_atomic_coordinates,
+                                    save_Q_rotation_matrix
+)
 ```
-where `use_matrix_or_lammps_cell_format` is `True` if usual matrix format for the cell is given and `False` if a lammps formatted cell is provided and `wrap_atomic_coordinates` is `True` if you want to wrap all the atomic coordinates inside the cell.
+where `input_box_format_id` is one of `pyanalisi.BoxFormat.CellVectors`, `pyanalisi.BoxFormat.LammpsOrtho`, `pyanalisi.BoxFormat.LammpsTriclinic` and describe the format of the array `box_array`. If `wrap_atomic_coordinates` is `True` the code will wrap all the atomic coordinates around the center of the cell (that does not mean inside the cell in the triclinc case). This makes the code for computing the minimum image distance more efficient if the atoms were far away out of the cell, but invalidates all the calculations were the atoms are not supposed to be wrapped back inside the cell.
 
-You can write a LAMMPS bynary trajectory (that can be used by the command line interface with mpi, for example) by calling
+You can write a LAMMPS binary trajectory (that can be used by the command line interface with mpi, for example) by calling
 ```
 analisi_traj.write_lammps_binary('output_path', start_timestep, end_timestep)
 ```
@@ -313,7 +333,8 @@ where `start_timestep` is the first timestep index to dump (indexes start from 0
     import pyanalisi
     analisi_traj = pyanalisi.Trajectory(pos, vel, types, box,
                                         True, # matrix format for the box array
-                                        False # don't wrap the coordinates
+                                        False, # don't wrap the coordinates
+                                        False # not interested in Q matrix
                                         )
     analisi_traj.write_lammps_binary("output_filename.bin"
                                , 0, # starting timestep
@@ -322,7 +343,7 @@ where `start_timestep` is the first timestep index to dump (indexes start from 0
    ```
 
 
-#### LAMMPS binary trajectory interface
+### LAMMPS binary trajectory interface
 
 
 This interface is a little more complicated, since it was designed for computing block averages of very big files. It can read the same files that the command line program reads. The object is created with
@@ -370,7 +391,7 @@ The `header_array` object must describe the columns that are stored in the bidim
 ## MSD
 
 Given a trajectory <img src="svgs/995aad7488ee6146d6b411759a5db052.svg?invert_in_darkmode" align=middle width=20.789844899999988pt height=27.91243950000002pt/> where <img src="svgs/0e45672d84ee21e1464939301cc46b43.svg?invert_in_darkmode" align=middle width=137.42503499999998pt height=24.65753399999998pt/> is the atomic index and <img src="svgs/4f4f4e395762a3af4575de74c019ebb5.svg?invert_in_darkmode" align=middle width=5.936097749999991pt height=20.221802699999984pt/> is the timestep index, defining the center of mass position of the atomic species <img src="svgs/36b5afebdba34564d884d347484ac0c7.svg?invert_in_darkmode" align=middle width=7.710416999999989pt height=21.68300969999999pt/> at the timestep <img src="svgs/4f4f4e395762a3af4575de74c019ebb5.svg?invert_in_darkmode" align=middle width=5.936097749999991pt height=20.221802699999984pt/> as
-<p align="center"><img src="svgs/1069af11f835cf7996677118283e621d.svg?invert_in_darkmode" align=middle width=165.4146549pt height=51.449925449999995pt/></p>
+<p align="center"><img src="svgs/397343bd70e7e236981916f727d5d2a0.svg?invert_in_darkmode" align=middle width=165.4146549pt height=45.002035649999996pt/></p>
 where <img src="svgs/bf032e59612f5135c5425a8a0c07f2f7.svg?invert_in_darkmode" align=middle width=19.312276499999992pt height=22.465723500000017pt/> is the number of atoms of the specie <img src="svgs/36b5afebdba34564d884d347484ac0c7.svg?invert_in_darkmode" align=middle width=7.710416999999989pt height=21.68300969999999pt/>,
 the code computes the following
 <p align="center"><img src="svgs/fb6fd8b28a21787d6b119a3f059a9bd3.svg?invert_in_darkmode" align=middle width=429.54891045pt height=110.10997139999999pt/></p>
@@ -419,7 +440,7 @@ In general the object msd_calculation supports the buffer protocol interface, so
 
 ## Green-Kubo
 
- Given <img src="svgs/fb97d38bcc19230b0acd442e17db879c.svg?invert_in_darkmode" align=middle width=17.73973739999999pt height=22.465723500000017pt/> vector time series of length <img src="svgs/f9c4988898e7f532b9f826a75014ed3c.svg?invert_in_darkmode" align=middle width=14.99998994999999pt height=22.465723500000017pt/> <img src="svgs/6ea8ff937f14aa2a3e0622a395dee889.svg?invert_in_darkmode" align=middle width=27.224196449999987pt height=22.55708729999998pt/>, <img src="svgs/e17d8a1279f0bfa5b92c35f05c242243.svg?invert_in_darkmode" align=middle width=101.57889059999998pt height=24.65753399999998pt/>, <img src="svgs/3ef98fe3db393644cede24e594d556d5.svg?invert_in_darkmode" align=middle width=90.34214144999999pt height=24.65753399999998pt/>,
+ Given <img src="svgs/fb97d38bcc19230b0acd442e17db879c.svg?invert_in_darkmode" align=middle width=17.73973739999999pt height=22.465723500000017pt/> vector time series of length <img src="svgs/f9c4988898e7f532b9f826a75014ed3c.svg?invert_in_darkmode" align=middle width=14.99998994999999pt height=22.465723500000017pt/> <img src="svgs/6ea8ff937f14aa2a3e0622a395dee889.svg?invert_in_darkmode" align=middle width=27.22419314999999pt height=22.55708729999998pt/>, <img src="svgs/e17d8a1279f0bfa5b92c35f05c242243.svg?invert_in_darkmode" align=middle width=101.57889059999998pt height=24.65753399999998pt/>, <img src="svgs/3ef98fe3db393644cede24e594d556d5.svg?invert_in_darkmode" align=middle width=90.34214144999999pt height=24.65753399999998pt/>,
  implements an expression equivalent to the following formula:
  <p align="center"><img src="svgs/168f1aff83bb74b9c5f70ae822348485.svg?invert_in_darkmode" align=middle width=251.32803135pt height=254.2893243pt/></p>
  but with the trapezoidal rule in place of the sums marked with <img src="svgs/cdcac8939f3840cd8cddf40059a4cf58.svg?invert_in_darkmode" align=middle width=6.735194399999992pt height=22.63846199999998pt/>. Note that <img src="svgs/bcd81920696e26093495d90eb4cd7b1b.svg?invert_in_darkmode" align=middle width=16.153034699999992pt height=22.465723500000017pt/> is a matrix. To get the correct units of measure, you have still to multiply all the quantities but the <img src="svgs/9442d407594839d38f91ad3d2deb134e.svg?invert_in_darkmode" align=middle width=16.71464189999999pt height=22.465723500000017pt/>s by the integration timestep. <img src="svgs/8379d82223552fdbdfd98783823c755b.svg?invert_in_darkmode" align=middle width=33.56332814999999pt height=22.465723500000017pt/> is the number of timesteps on which the code runs the average.

diff --git a/README.pdf b/README.pdf
diff --git a/README_.md b/README_.md
@@ -69,10 +69,16 @@ analisi_traj = pyanalisi.Trajectory(
       pos, #shape (ntimesteps, natoms, 3)
       vel, #same shape as pos
       types, #shape (natoms)
-      box, #shape (ntimesteps, 3,3) or (ntimesteps, 6)
-      True, # if True, box for a single frame is 3,3,
-            # if False is 6
-      False # don't apply pbc
+      box, #shape (ntimesteps, 3,3)
+           # or (ntimesteps, 6)
+           # or (ntimesteps, 9)
+      pyanalisi.BoxFormat.CellVectors,
+                   # input box format is
+                   # matrix of cell vectors
+      False, # don't wrap atoms around the center of 
+             # the cell
+      False # don't save rotation matrix that is used
+            # internally if the cell is triclinic
   )
 
 #do the calculation that you want
@@ -266,22 +272,34 @@ Where meaningful, the following arguments are shared by all calculation performe
 You can create a trajectory python object to be used in this library in two ways: 
  - start from python arrays generated by other code
  - use a LAMMPS binary file that you have on the filesystem. This is the same file that is used by the command line interface.
+
+to access the positions, velocities and cell data you can use in both the python buffer protocol interface and the lammps binary one the functions `get_positions_copy()`, `get_velocities_copy()` and `get_box_copy()`. Note that each call of these functions allocates more memory, as needed.
+
+### Internal format used to store the cell information
+
+Internally the format used for storing the cell information is:
+
+`[x_low, y_low, z_low, lx/2, ly/2, lz/2, xy, xz, yz]`
+
+and is accessible by using the python function `get_box_copy()` common to the trajectory objects. Note that internally the first cell vector is always in the same direction of the x axis, the second one is on the xy plane while the third has an arbitrary direction. This is equivalent to requiring that the cell matrix is triangular. If necessary, this is achieved by doing a QR decomposition of the input cell matrix and then rotating everything with the Q matrix.
 
 
-#### using Python arrays: the buffer protocol interface
+### using Python arrays: the buffer protocol interface
 
 
 You must have 4 arrays. In general the interface supports any object that supports python's buffer protocol. This include, among others, numpy arrays. Suppose you have a trajectory of `t` timesteps and `n` atoms. You need the following arrays:
  - position array, of shape `(t,n,3)`
  - velocity array, of shape `(t,n,3)`
- - cell array, of shape `(t,3,3)` only diagonal matrices (orthorombic cells) are supported at the moment, or if a lammps formated cell is provided `(t,6)` The lammps format simply list the low and high coordinates of the orthorombic cell, as in the header of each timestep that you find in the lammps trajetory format (shown in [command line interface](#command-line-interface) section )
+ - cell array, of shape `(t,3,3)` or if a lammps formated cell is provided `(t,6)` for orthorombic and `(t,9)` for triclinic. The lammps format simply list the low and high coordinates for each dimension and the tilts factors as described below
  - types array, of shape `(n)` (integer array)
 
 In general no particular units of measure are required, and the output will reflect the units that you used in the input. The calculations that the program does are reported exactly in the following sections. From those formulas you can deduce the units of the output given the units that you used in the input.
 
-Then you must decide if you want that the coordinates are rewrapped inside the cell or not. At the moment only orthorombic cells are supported in all calculations but those that need only unwrapped coordinates, like MSD.
+Then you must decide if you want that the coordinates are rewrapped around the center of the cell or not. This is not equivalent to wrapping the coordinates inside the cell for the triclinic case, but generates a compact list of positions suitable for an efficient calculation of the minimum image distance.
+
+The lammps format for the cell is `[x_lo, x_hi, y_lo, y_hi, z_lo, z_hi, xy, xz, yz]`, as described in the [lammps documentation](https://docs.lammps.org/Howto_triclinic.html).
 
-The lammps format for the cell is `[x_lo, x_hi, y_lo, y_hi, z_lo, z_hi]`: you have to provide only the coordinates boundaries.
+If the plain cell vectors are provided in the cell matrix and this matrix is not diagonal, a QR decomposition is performed to get a triangular cell matrix and the achieve the desidered internal format. In this case all velocities and positions vector are rotated with the rotation matrix Q, that can be obtained with the function `get_rotation_matrix()`.
 
 The syntax for creating the object is 
 ```
@@ -290,12 +308,14 @@ analisi_traj = pyanalisi.Trajectory(positions_array,
                                     velocity_array,
                                     types_array,
                                     box_array,
-                                    use_matrix_or_lammps_cell_format,
-                                    wrap_atomic_coordinates)
+                                    input_box_format_id,
+                                    wrap_atomic_coordinates,
+                                    save_Q_rotation_matrix
+)
 ```
-where `use_matrix_or_lammps_cell_format` is `True` if usual matrix format for the cell is given and `False` if a lammps formatted cell is provided and `wrap_atomic_coordinates` is `True` if you want to wrap all the atomic coordinates inside the cell.
+where `input_box_format_id` is one of `pyanalisi.BoxFormat.CellVectors`, `pyanalisi.BoxFormat.LammpsOrtho`, `pyanalisi.BoxFormat.LammpsTriclinic` and describe the format of the array `box_array`. If `wrap_atomic_coordinates` is `True` the code will wrap all the atomic coordinates around the center of the cell (that does not mean inside the cell in the triclinc case). This makes the code for computing the minimum image distance more efficient if the atoms were far away out of the cell, but invalidates all the calculations were the atoms are not supposed to be wrapped back inside the cell.
 
-You can write a LAMMPS bynary trajectory (that can be used by the command line interface with mpi, for example) by calling
+You can write a LAMMPS binary trajectory (that can be used by the command line interface with mpi, for example) by calling
 ```
 analisi_traj.write_lammps_binary('output_path', start_timestep, end_timestep)
 ```
@@ -313,7 +333,8 @@ where `start_timestep` is the first timestep index to dump (indexes start from 0
     import pyanalisi
     analisi_traj = pyanalisi.Trajectory(pos, vel, types, box,
                                         True, # matrix format for the box array
-                                        False # don't wrap the coordinates
+                                        False, # don't wrap the coordinates
+                                        False # not interested in Q matrix
                                         )
     analisi_traj.write_lammps_binary("output_filename.bin"
                                , 0, # starting timestep
@@ -322,7 +343,7 @@ where `start_timestep` is the first timestep index to dump (indexes start from 0
    ```
 
 
-#### LAMMPS binary trajectory interface
+### LAMMPS binary trajectory interface
 
 
 This interface is a little more complicated, since it was designed for computing block averages of very big files. It can read the same files that the command line program reads. The object is created with
@@ -371,7 +392,7 @@ The `header_array` object must describe the columns that are stored in the bidim
 
 Given a trajectory $\bf ^ix_t$ where $i\in\{1,\dots,N_{atoms}\}$ is the atomic index and $t$ is the timestep index, defining the center of mass position of the atomic species $j$ at the timestep $t$ as
 $$
-^{j}cm_t=\frac{1}{N_{j}}\sum_{i|type(i)=j}^i{\bf x}_{t}
+^{j}cm_t=\frac{1}{N_{j}}\sum_{i|type(i)=j}{\bf x}_{t}
 $$
 where $N_j$ is the number of atoms of the specie $j$,
 the code computes the following