Metsi forestry simulator is a Python based forest growth and maintenance operation simulator developed in Natural Resources Institute Finland.
The simulator is a alternative state simulator operating upon forest state data. The state data is manipulated by simulator operations over a progression of time steps. The event and branching structure for simulator operations is declared in a human-readable YAML format or directly by functional declaration. This declaration is used to generate a simulation event tree holding the full branching possibilities for the simulation. The event tree is evaluated with the simulator engine to produce alternative end results.
To get started:
- Install Python >= 3.10 for your platform.
- Install git for your platform.
- Ensure that the commands
python
,pip
andgit
are available in your command line interface (CLI). We assume a UNIX-like shell CLI such as Git Bash for Windows users. - Initialize the project with the commands below.
git clone https://github.com/lukefi/metsi
cd metsi
pip install .
This installs the project into the site-packages of your Python deployment, using the project's pyproject.toml
file.
The program is then usable from command line by simply invoking metsi
.
For developer usage, application entry point is the file lukefi/metsi/app/metsi.py
or the namespace package
lukefi.metsi.app.metsi
.
To obtain the latest changes use the command git pull
.
To be able to use forestry operations depending on R modules
- Install R runtime version >=4.1 for your platform.
- Ensure that the
R
command is available in CLI. - Install the
rpy2
Python module with the commands below.
pip install .[rpy2]
Access to this module is restricted to Natural Resources Institute Finland by special admission. Installing this will not work without access to the related GitHub repositories.
pip install -r requirements-motti.txt
The corresponding growth operation is grow_motti
.
NOTE: For either model, the input data must contain precomputed weather data (temperature sums, sea/lake indices).
In practice this means that you must enable the compute_location_metadata
preprocessing operation even if you're
using the Lua models.
This code project is divided into following python packages in the lukefi.metsi namespace.
package | description |
---|---|
l.m.app | Application entry points. Side-effectful program logic. |
l.m.sim | Simulator engine. |
l.m.domain | Operations for the forest development simulation. |
l.m.tests | Unit test suites for above packages. |
Dependency libraries for this project are listed in pyproject.toml
.
The project contains a single application entry point. This is the lukefi/metsi/app/metsi.py
.
The application implements a 4 phase pipeline. These phases are: preprocess, simulate, postprocess and export. Each of
the phases can be run independetly or as sequences, as long as their logical order for input data structuring is
preserved. Each application phase uses the given YAML file (default control.yaml
) as their configuration.
phase | description |
---|---|
preprocess | Operations for filtering or modifying the input data set |
simulate | Discrete time-step simulation and data collection with given event tree and operations |
postprocess | Operations for further deriving and marshalling of data from the simulation results |
export | File output into different formats based on simulation or post-processing results |
Input for preprocess and simulate phases is a forest data file. This file is a list of forest stand objects with lists of associated reference trees and tree strata. The file can be of following types and formats:
- a .json file or .pickle file containing Forest Data Model type source data.
- a .dat file containing VMI12 or VMI13 type source data
- a .xml file containing Forest Centre type source data
- a .gpkg file containing Forest Centre type source data
Input for postprocess and export phases is a directory produced by the simulate phase.
There are several example input files in the project test resources tests/resources/file_io_test
and tests/data/resources
directories.
Preprocessing generates a csv, pickle or json file with the computational units as a list. The data is always in FDM
format. The file is written into the configured output directory as preprocessing-result.{csv,pickle,json}
.
Simulate collects a nested data structure containing the final states for each produced alternatives of each computational unit. A dictionary data structure for computed data during the simulation is included for each such alternative. This data structure can be outputted as a directory structure into the configured output directory.
Post-processing utilizes exactly same nested data structure as produced by simulation. It will derive further data within and across the produced alternatives and stores them within the derived data structure. This data can be outputted as a directory structure into the configured output directory.
Exporting uses the nested data structure above. It will select partial data sets as configured. These data sets may be written to any supported and compatible file containers. Such support must be implemented on a case-by-case basis as modules into the exporting functionality.
When phases are run in order on the same run, intermediate result files do not need to be written out. Data is kept and propagated in-memory.
Despite other output formats the computational units can be outputted as a RST format which is in itself a special format used only by the Natural Resource Institutes MELA simulator. The RST format is only genereted as preprocessing output and contains the reference tree information of forest stands. Along with the RST format an RSTS format file is genereted which contains the stratum information of the computationals units.
Use the following command to output simulator application help menu.
python -m lukefi.metsi.app.metsi --help
Input path, output path and optionally the control file path must be supplied as CLI positional arguments. All other
parameters in commands below can also be set in the control.yaml
file app_configuration
block. Control file settings
override program defaults (see app/app_io.py MetsiConfiguration class). CLI arguments override the settings in the
control file.
At the time of writing this, there are no post-processing operations ready to be used. Post-processing
will do_nothing
All examples below default to control.yaml
in the working directory as the control file source unless otherwise
specified in the command line.
Preprocessing, simulation and post-processing phases do not produce output files by default. Configuration for preprocessing output container, state output container and derived data output container need to be set to produce output files. The default mode of operation is to run the full pipeline and only the export phase will create files as configured.
To run full pipeline from a VMI12 data source file using direct reference trees from the input data, run:
python -m lukefi.metsi.app.metsi --state-format vmi12 --measured-trees -r preprocess,simulate,postprocess,export vmi12.dat sim_outdir
To run full pipeline from a VMI13 data source file using direct reference trees from the input data, run:
python -m lukefi.metsi.app.metsi --state-format vmi13 --measured-trees -r preprocess,simulate,postprocess,export vmi13.dat sim_outdir
To run full pipeline from a Forest Centre .xml source file, run:
python -m lukefi.metsi.app.metsi --state-format xml -r preprocess,simulate,postprocess,export forest_centre.xml sim_outdir
To run full pipeline from a Forest Centre .gpkg source file, run:
python -m lukefi.metsi.app.metsi --state-format gpkg -r preprocess,simulate,postprocess,export geopackage.gpkg sim_outdir
To run full pipeline from a FDM formatted data from csv (or json or pickle with replacement below), run:
python -m lukefi.metsi.app.metsi --state-input-container csv -r preprocess,simulate,postprocess,export forest_data.csv sim_outdir
To run preprocess and simulate phases of the application, run the following command in the project root. The created output directory contains all generated variants for all computation units (ForestStand) along with derived data.
python -m lukefi.metsi.app.metsi --state-input-container pickle -r preprocess,simulate forest_data.pickle sim_outdir
In case you need to use a FDM formatted JSON file as the input and/or output file format, run:
python -m lukefi.metsi.app.metsi --state-input-container json --state-output-container json -r preprocess,simulate forest_data.json sim_outdir
To only run the preprocessor and produce output as outdir/preprocessing_result.csv
, with a control yaml file in
non-default location my_project/control_preprocessing.yaml
, run:
python -m lukefi.metsi.app.metsi --preprocessing-output-container csv -r preprocess forest_data.pickle sim_outdir my_project/control_preprocessing.yaml
To use the preprocessed result file as input for a simulation, and produce schedule results in csv+json format run:
python -m lukefi.metsi.app.metsi --state-input-container csv --state-output-container csv --derived-data-output-container json -r simulate sim_outdir/preprocessing_result.csv sim_outdir my_project/control_simulate.yaml
The output directory outdir
from simulate is usable as input for the post-processing phase of the application.
It will create a new directory outdir2
with matching structure for its output with the following command:
python -m lukefi.metsi.app.metsi -r postprocess outdir outdir2 my_project/postprocessing_control.yaml
The output directory outdir
from simulation (or outdir2
from post-processing) can be used as input for the export
phase as follows:
python -m lukefi.metsi.app.metsi -r export outdir outdir2 my_project/export_control.yaml
See table below for a quick reference of forestry operations usable in control.yaml.
operation | description | source | model library |
---|---|---|---|
do_nothing | This operation is no-op utility operation to simulate rest | native | |
grow_acta | A simple ReferenceTree diameter and height growth operation | Acta Forestalia Fennica 163 | metsi-forestry |
grow_motti | A ReferenceTree growth operation with death and birth models. Requires pymotti . |
Luke Motti group | pymotti |
first_thinning | An operation reducing the stem count of ReferenceTrees as a first thinning for a forest | Reijo Mykkänen | metsi-forestry |
thinning_from_below | An operation reducing the stem count of ReferenceTrees weighing small trees before large trees | Reijo Mykkänen | metsi-forestry |
thinning_from_above | An operation reducing the stem count of ReferenceTrees weighing large trees before small trees | Reijo Mykkänen | metsi-forestry |
even_thinning | An operation reducing the stem count of ReferenceTrees evenly regardless of tree size | Reijo Mykkänen | metsi-forestry |
report_collectives | Save the values of collective variables | native | |
calculate_biomass | Calculate biomass accruals from the current reference tree properties | Laura Jaakkola | native |
report_state | save the values of state variables at the current time point | native | |
filter | Filter stands, trees and strata | native | |
cross_cut_felled_trees | Perform cross cut operation to results of previous thinning operations | Annika Kangas | metsi-forestry |
cross_cut_standing_trees | Perform cross cut operation to all standing trees on a stand | Annika Kangas | metsi-forestry |
calculate_npv | Calculate net present value of stand and harvest revenues subtracted by renewal operation costs. | Urho Niemelä | native |
calculate_npv | Calculate net present value of stand and harvest revenues subtracted by renewal operation costs. | Urho Niemelä | native |
clearcutting | Clear the stand of reference trees and produce data for cross-cutting. | Laura Jaakkola | native |
planting | Plant sapling reference trees on an empty stand. | Laura Jaakkola | native |
Performes renewal operation planting which plants saplings on a empty stand.
parameter name | type | description | location in control.yaml | notes |
---|---|---|---|---|
planting_instructions | string | In file rows represent site types, first column contains the tree species, second column the stems per hectar value and third column is the soil preparation type. | operation_file_params | optional |
tree_count | int | Number of sapling trees to be plante. | operation_params | optional, default 10 |
The output is written into the derived data container with two keys renewal
and regeneration
.
Output of renewal is a regeneration description python dictionary containing following values regeneration key, soil prepration type, tree species and stem count.
Output of regeneration is is a list of PriceableOperationInfo
objects:
attribute name | type | description |
---|---|---|
operation | string | renewal operation name |
units | float | stand area |
time_point | int | time point of operation execution timepoint |
Performs even thinning which removes stems evenly from all reference tree classes. Removal bounds are defined by basal area.
parameter name | type | location in control.yaml | notes |
---|---|---|---|
thinning_limits | float | operation_file_params | optional parameter |
e | float | operation_params | residue constant |
thinning_factor | float | operation_params | removal intensity |
Operation outputs a list of CrossCuttableTree objects
Object attributes:
attribute name | type | description |
---|---|---|
stems_per_ha | float | number of removed stems |
species | TreeSpecies | tree species of removed reference tree |
breast_height_diameter | float | trees diameter at breast height |
height | float | trees height |
source | string | standing or harveste |
operation | string | operation that produced such output |
time_point | int | time point of operation execution |
cross_cut_done | bool | cross cut operation executed |
- parameter e is a residue constant so that the removal ratio would not go under the lower limit.
- For example e=0.2
Performs thinning from below which primarily removes trees with a smaller diameter. Removal bounds are defined by basal area.
parameter name | type | location in control.yaml | notes |
---|---|---|---|
thinning_limits | float | operation_file_params | optional parameter |
e | float | operation_params | residue constant |
thinning_factor | float | operation_params | removal intensity |
Operation outputs a list of CrossCuttableTree objects
Object attributes:
attribute name | type | description |
---|---|---|
stems_per_ha | float | number of removed stems |
species | TreeSpecies | tree species of removed reference tree |
breast_height_diameter | float | trees diameter at breast height |
height | float | trees height |
source | string | standing or harveste |
operation | string | operation that produced such output |
time_point | int | time point of operation execution |
cross_cut_done | bool | cross cut operation executed |
- parameter e is a residue constant so that the removal ratio would not go under the lower limit.
- For example e=0.2
Performs thinning from above which primarily removes trees with a larger diameter. Removal bounds are defined by basal area.
parameter name | type | location in control.yaml | notes |
---|---|---|---|
thinning_limits | float | operation_file_params | optional parameter |
e | float | operation_params | residue constant |
thinning_factor | float | operation_params | removal intensity |
Operation outputs a list of CrossCuttableTree objects
Object attributes:
attribute name | type | description |
---|---|---|
stems_per_ha | float | number of removed stems |
species | TreeSpecies | tree species of removed reference tree |
breast_height_diameter | float | trees diameter at breast height |
height | float | trees height |
source | string | standing or harveste |
operation | string | operation that produced such output |
time_point | int | time point of operation execution |
cross_cut_done | bool | cross cut operation executed |
- parameter e is a residue constant so that the removal ratio would not go under the lower limit.
- For example e=0.2
Performs first thinning which primarily removes trees with a smaller diameter. Removal bounds are defined by number of stems.
parameter name | type | location in control.yaml | notes |
---|---|---|---|
dominant_height_lower_bound | float | operation_params | |
dominant_height_upper_bound | float | operation_params | |
e | float | operation_params | residue constant |
thinning_factor | float | operation_params | removal intensity |
Operation outputs a list of CrossCuttableTree objects
Object attributes:
attribute name | type | description |
---|---|---|
stems_per_ha | float | number of removed stems |
species | TreeSpecies | tree species of removed reference tree |
breast_height_diameter | float | trees diameter at breast height |
height | float | trees height |
source | string | standing or harveste |
operation | string | operation that produced such output |
time_point | int | time point of operation execution |
cross_cut_done | bool | cross cut operation executed |
- parameter e is a residue constant so that the removal ratio would not go under the lower limit.
- For example e=0.2
Compute total biomass tonnages of a single forest stand.
parameter name | type | location in control.yaml | notes |
---|---|---|---|
model_set | int | operation_params | accepted values: 1 and 2 |
- model_set accepts following values 1, 2, 3 or 4
- if value is 1 wood, bark, living and dead branches, foliage, stumps and roots are collected with model set Y
- if value is 2 wood, bark, living and dead branches, foliage, stumps and roots are collected with model set X
Attributes of the BiomassData object
attribute name | type |
---|---|
stem_wood | float |
stem_bark | float |
stem_waste | float |
living_branches | float |
dead_branches | float |
foliage | float |
stumps | float |
roots | float |
Enables collecting the states of user-defined variables at the time of the operation call.
The parameters passed to the operation are the variables that the user wants to report. The parameters are key-value pairs, where the key defines the name of the variable, and the value defines how the variable is constructed.
The operation makes available a set of collections that can be used in the definition of desired variables. These collections are:
name | description | class whose attributes are available |
---|---|---|
state | forest stand | ForestStand |
reference_trees | stand's reference trees | ReferenceTrees |
felled_trees | trees that have been thinned/clearcut | CrossCuttableTree |
cross_cutting | results of cross cutting | CrossCutResult |
renewal | results of renewal operations | PriceableOperationInfo |
net_present_value | results of net present value calculations | NPVResult |
For example, to get the total stems per hectare in the years the operation is defined for, one would
define report_state
's operation parameters as:
report_state:
- total_stems_per_ha: reference_trees.stems_per_ha
The stand's reference trees are stored under the name reference_trees
, and the attributes defined for that name can
be used to get values. The returned total_stems_per_ha
is the sum of the stand's trees' stems_per_ha
s.
However, often one needs more detailed information about the state, and therefore filter only certain variables. For example, to get the stems per hectare of pines:
report_state:
- total_stems_per_ha: reference_trees.stems_per_ha[reference_trees.species == 1]
or, to be even more fine-grained, get the stems_per_ha of pines that are not saplings:
report_state:
- total_stems_per_ha: reference_trees.stems_per_ha[(reference_trees.species == 1) & (reference_trees.sapling == False)]
Notice the parentheses around the filter conditions, when using multiple conditions.
Operation makes it possible to collect sums of derived data collections between periods.
For example in a 30 year long simulation one could define three 10-year periods by defining report_period on time points 10, 20 and 30. As the simulation executes the report_period operation collects user-defined derived data between years 0-9, 10-19 and 20-29.
See documentation of report_state operation for definition and usage.
Clears the stands reference tree list and stores them as a list of CrossCuttableTree
objects into derived data with
the felled_trees
keyword.
parameter name | type | description | location in control.yaml | notes |
---|---|---|---|---|
clearcutting_limits_ages | string | In file the rows represent site type values and colums represent tree species and the values represent the smallest possible age enabled. | operation_file_params | example file data/parameter_files/renewal_ages_southernFI.txt |
clearcutting_limits_diameters | string | In file the rows represent site type values and colums represent tree species and the values represent smallest possible breast height diameters enabled. | operation_file_params | example file data/parameter_files/renewal_diameters_southernFI.txt |
minimum_time_interval | int | After the operation is executed it will not be executed again until the minimum_time_interval is reached. |
run_constrains |
Output object attributes:
attribute name | type | description or value |
---|---|---|
stems_per_ha | float | number of removed stems |
species | TreeSpecies | tree species of removed reference tree |
breast_height_diameter | float | trees diameter at breast height |
height | float | trees height |
source | string | value: harvested |
operation | string | value: clearcutting |
time_point | int | time point of operation execution |
cross_cut_done | bool | cross cut operation executed or not |
Calculates the volume and value of harvested trees using Annika Kangas' cross cutting algorithm. Whenever this operation is called, it cross-cuts all thinning and clearcutting results that have been produced before it, but have not yet been cross-cut. Given this, it is enough to call this operation once before cross cutting results are needed.
The time_point
attribute of the resulting CrossCutResult objects will be determined by the tree's havest year, not the
year when this operation is called.
parameter name | type | location in control.yaml | notes |
---|---|---|---|
timber_price_table | file (csv) | operation_file_params | timber grades must be given as integers |
implementation | str | operation_params | py and lupa (lua) implementations available |
Attributes of the CrossCutResult object
attribute name | type |
---|---|
species | TreeSpecies |
timber_grade | int |
volume_per_ha | float |
value_per_ha | float |
stand_area | float |
source | str (either "harvested" or "standing") |
operation | str (operation tag, or '' if source == "standing") |
time_point | int |
Calculates the volume and value of standing trees using Annika Kangas' cross cutting algorithm at the time of the operation call. This operation does not actually harvest the stand, but rather evaluates the the volume and value of its trees if they were cross cut. Therefore, this operation is different from clearcutting.
parameter name | type | location in control.yaml | notes |
---|---|---|---|
timber_price_table | file (csv) | operation_file_params | timber grades must be given as integers |
implementation | str | operation_params | py and lupa (lua) implementations available |
Attributes of the CrossCutResult object
attribute name | type |
---|---|
species | TreeSpecies |
timber_grade | int |
volume_per_ha | float |
value_per_ha | float |
stand_area | float |
source | str (either "felled" or "standing") |
operation | str |
time_point | int |
Calculates the Net Present Value (NPV) of a given schedule.
parameter name | type | location in control.yaml | notes |
---|---|---|---|
interest_rates | list of int | operation_params | e.g. [3], where 3 stands for 3% |
land_values | file (json) | operation_file_params | |
renewal_costs | file (csv) | operation_file_params |
- This operation expects that
cross_cut_felled_trees
has been called previously to cross cut any previous thinning output. - This operation expects that
cross_cut_standing_trees
has been called in the same time point, so that the present value of the standing trees can be evaluated correctly.
$$ NPV = \underbrace{\sum_{t=0}^T \frac{h_ta}{(1+r)^t}}\text{(1)}+ \underbrace{\frac{S_Ta}{(1+r)^T}}\text{(2)}- \underbrace{\sum_{t=0}^T \frac{c_ta}{(1+r)^t}}\text{(3)}+ \underbrace{LV}\text{(4)}
$$
where:
-
$h_t$ is the per-hectare harvest revenue from the stand at time$t$ -
$a$ is the stand's area in hectares -
$r$ is the interest rate -
$S_T$ is the value of standing tree stock at the final time point$T$ -
$c_t$ is the per-hectare costs of stand treatment at time$t$ -
$LV$ is the bare land value of the stand, calculated for the interest rate$r$ .
(1) harvest revenues originate from cross_cut_felled_trees
(2) stand value originates from cross_cut_standing_trees
(3) currently, costs originate only from renewal operations
(4) Bare land values are passed in as a file parameter for the NPV operation. These values are already discounted with the given interest rate, so no discounting happens here. See MELA 2016 reference manual p.175-176 for more information about this.
To run unit test suites, run in the project root
python -m pytest
You can also use python internal module unittest
python -m unittest <test suite module.class path>
A run is declared in the YAML file control.yaml
.
- Application configuration in
app_configuration
object. These may be overridden by equivalent command line arguments. Note that e.g.state_format
withfdm
below is written as--state-format fdm
when given as a command line argument.state_format
specifies the data format of the input computational unitsfdm
is the standard Forest Data Model.vmi12
andvmi13
denote the VMI data format and container.forest_centre
denotes the Forest Centre XML data format and container.geo_package
denotes the Forest Centre GPKG data format and container.
state_input_container
is the file type forfdm
data format. This may becsv
,pickle
orjson
.preprocessing_output_container
is the file type for outputting thefdm
formatted state of computational units after preprocessing operations. This may becsv
,pickle
orjson
or commented out for no output.state_output_container
is the file type for outputting thefdm
formatted state of individual computational units during and after the simulation. This may becsv
,pickle
orjson
or commented out for no output.derived_data_output_container
is the file type for outputting derived data during and after the simulation. This may bepickle
orjson
or commented out for no output.strategy
is the simulation event tree formation strategy. Can bepartial
orfull
.measured_trees
instructs thevmi12
andvmi13
data converters to choose reference trees from the source.True
orFalse
.strata
instructs thevmi12
andvmi13
data converters strata from the source.True
orFalse
.strata_origin
instructs theforest_centre
converter to choose only strata with certain origin to the result.1
,2
or3
.multiprocessing
instructs the application to parallelizes the computation to available CPU cores in the system.True
orFalse
.
- Operaton run constrains in the object
run_constraints
- Operation parameters in the object
operation_params
. Operation parameters may be declared as a list of 1 or more parameter sets (objects). Operations within analternatives
block are expanded as further alternatives for each parameter set. Multiple parameter sets may not be declared for operations within anysequence
block. - List of
simulation_events
, where each object represents a single set of operations and a set of time points for those operations to be run.time_points
is a list of integers which assign this set of operations to simulation time pointsgenerators
is a list of chained generator functions (see section on simulation event generators)sequence
a list of operations to be executed as a chainalternatives
a list of operations which represent alternative branches
- Preprocessing operations can be passed as a list of strings under
preprocessing_operations
, and their (optional) arguments underpreprocessing_params
as key-value pairs. - Operation parameters that exist in files can be passed in
operation_file_params
as demonstated below:Note, that it is the user's responsibility to provide the file in a valid format for each operation.operation_file_params: first_thinning: thinning_limits: /path/to/file/thinning-limits.txt cross_cut_felled_trees: timber_price_table: /path/to/file/timber-prices.csv
- Post-processing is controlled in the
post_processing
section of the fileoperation_params
section sets key-value pairs to be passed as parameters to named post-processing operationspost_processing
section lists a non-branching list of post-processing operations to be run in sequence for the given data
- Export is controlled in the
export
section of the file (TODO: structure is in works)
The following example declares a simulation, which runs four event cycles at time points 0, 5, 10 and 15. Images below describe the simulation as an event tree, and further as the computation chains that are generated from the tree.
- At time point 0,
reporting
of the simulation state is done. - At time point 5, the
grow
operation is done on the simulation state and the simulation is branched by 3. One branch does not modify the forest state data withdo_nothing
, another performs athinning
operation on the forest state data with parameter set 1, and anotherthinning
operation with parameter set 2. - At time point 10, the 3 branches from time point 5 are extended separately with a
grow
operation, then branched again withdo_nothing
andthinning
operations with two parameter sets, resulting in 9 branches. - At time point 15,
reporting
is done on the 9 individual state branches.
# example of operation run constrains
# minimum time interval constrain between thinnings is 10 years
run_constraints:
thinning:
minimum_time_interval: 10
# example of operation parameters
# reporting operation gets one parameter set
# thinning operation gets two parameter sets
operation_params:
reporting:
- level: 1
thinning:
- thinning_factor: 0.7
e: 0.2
- thinning_factor: 0.9
e: 0.1
# simulation_events are a collection of operations meant to be executed at
# the specified time_points
simulation_events:
- time_points: [ 5, 10 ]
generators:
- sequence:
- grow
- alternatives:
- do_nothing
- thinning
- time_points: [ 0, 15 ]
generators:
- sequence:
- reporting
Event tree from declaration above
Operation chains from event tree above, as produced by the full tree formation strategy. See below for the partial tree strategy.
The report_collectives
operation and data export make use of collective variables. A collective variable is a python
expression that is evaluated on a ForestStand
. For example the expression year
will collect ForestStand.year
.
You can collect tree variables by typing reference_trees.variable_name
. You can also slice the variable,
eg. reference_trees.volume[reference_trees.species==1]
will collect the total volume of pine. The collective arrays
are just numpy arrays, so all numpy slicing operations are supported.
For export data collection you can index the collection array by collection year, so Vpine
would export the
collective Vpine
from all periods, and Vpine[0,5]
would export it from years 0 and 5.
You can use the filter
operation to control what data to keep or remove during pre-processing.
It takes a dict of action: expression
where action
is of the form select|remove[ stands|trees|strata]
and expression
is a Python expression.
select
and remove
are aliases for select stands
and remove stands
. select
removes all objects for
which expression
evaluates to a falsy value. remove
removes objects for which expression
evaluates to a truthy
value.
The following example removes all sapling trees, trees without stem count and stands without reference trees:
preprocessing_params:
filter:
- remove trees: sapling or stems_per_ha == 0
remove: not reference_trees
Evaluation order is the order of parameters, so the example would first remove trees and then stands.
You can also reuse filters with named filters. A named filter is an expression given in the named
parameter, and it
can be used in other expressions (including other named filters). The following example is equivalent to the previous
one:
preprocessing_params:
filter:
- named:
nostems: stems_per_ha == 0
notrees: not reference_trees
remove trees: sapling or nostems
remove: notrees
The three important concepts in the sim
package are operations, processor, event tree and
event generators.
An operation is a function whose responsiblities are 1) to trigger manipulation of simulation state and 2) to compute
derived data about simulation state before and/or after state manipulation. For the purposes of the simulator, the
operation is a partially applied function from the domain package (forestry) such that it will take only one argument.
They are produced as lambda functions based on the control.yaml
declaration.
As an example, a single operation such as grow
would receive a single argument of type ForestStand
upon which it
operates and finally returns a ForestStand
for the modified/new state.
The processor is a function wrapper which handles running a prepared operation (see above). The parameter is
an OperationPayload
instance. The OperationPayload
object is the container for simulation state data, along with a
record of simulation run history and operation run constraints. Responsibilities of the processor function are as
follows:
- Determine if run constraints apply to the operation to be run. Abort and raise an exception if so.
- Execute the operation function with simulation state data.
- Create a record of the run in simulation run history.
- Pack results as a new
OperationPayload
and return it.
Processor functions are produced as lambda functions based on the control.yaml
declaration.
The event tree is a tree data structure where each individual node represents a prepared simulation operation. It is
generated based on the control.yaml
declaration. Unique operation chains are generated based on the event tree for
individual chain runs, or the event tree can be evaluated by depth-first walkthrough. This is controlled by the
evaluation strategy.
sequence
and alternatives
are functions which produce EventTree
instances for given input functions and as
successors of previous EventTree
instances. For the simulation purposes, these input functions are the prepared
processors (see above), but the simulator implementation literally does not care what these functions are. Sequences are
linear chains of event. Alternatives are branching events. EventTree instances are generated and bound to earlier
EventTree leaf nodes as branches.
The generators are chainable and nestable such that they can expand the event tree in formation based on the results of
a previous generator's results. The NestableGenerator
represents a tree structure for nested generator declarations.
It is constructed from the simulation_events
structure given from a configuration source. A SimConfiguration
structure, likewise populated from a configuration source is used as a template for binding the created generator
functions with prepared domain operation functions. The control source is an application's control.yaml
file's dict
structure or another compatible source.
compose_nested
function executes the given NestableGenerator
which in turn utilizes its prepared sequence
and alternatives
calls to build a complete simulation event tree.
The forestry
package contains the operations necessary to represent the simulation state data and operations acting
upon that data.
The class ForestStand
and the ReferenceTree
and TreeStratum
instances it refers to. A single ForestStand
instance fully represents a forestry simulation state.
Operations are functions which take two arguments
- A tuple of a
ForestStand
instance and aCollectedData
containing derived/computed data during the simulation run - Python
dict
containing parameters for this operation
By convention (since Python as a language does not allow us to properly enforce this), these functions must remain pure
and not trigger side-effectful program logic. Operations may do in-place mutation of the argument tuple. Operations may
not mutate the operation parameter dict
sim.runners
module has two functions of interest:
evaluate_sequence
executes a prepared chain of functions (from the event tree), returning the final simulated stated or raising an execption upon any failure.run_chains_iteratively
is a simple iterator for given chain of prepared operations (from the event tree)
Note that this package is a simple run-testing implementation. In the future we wish to expand upon this to allow for distributed run scenarios using Dask.
The simulator strategies determine how the program traverses down the simulation tree. The full strategy
executes run_full_tree_strategy
, whereas the partial strategy executes run_partial_tree_strategy
.
It should be noted that while the full strategy is simpler to understand conceptually, it carries a significant memory
and runtime overhead for large simulation trees, and therefore the partial
strategy should be focused on as the
performant solution.
Your primary responsibility is to write the functionality that acts upon simulation state data based on parameters of your choosing. For the forestry case, the state data is a single instance of a ForestStand, which always has a member list of ReferenceTree instances.
You write an operation function, which is an entry point to your work.
- It shall take in a single argument, a tuple of (ForestStand, CollectedData).
- It shall return a tuple of (ForestStand, CollectedData), or it shall raise an Exception when the operation can not complete its work for any reason.
The operation function can internally be whatever you require it to be. Write out as many other functions you need for the underlying scientific models. Consider developing these under the metsi-forestry library.
Keep your work functionally pure. This means that the implementations must never access input and output (API calls, file access, etc.). This also means that all data must be passed as function arguments, return values and exceptions. Do not use shared memory outside of the scope of the functions you write. Python does not allow us to be strict about this, so it is up to you! Being strict about it is crucial for producing safe, testable, provable and deterministic implementations.
Implement unit tests for functions that you write. Unit tests allow you to develop your functions completely independently from the rest of the system. You do not need to run the simulation to test your work, but use a test to ensure that your function returns what is expected and behaves like it's intended.
- Coordinate with simulator developers when some need arises that you feel can not be addressed with the model described above. A solution that doesn't require breaking functional purity can most certainly be found by developing the simulator and operations interface structure.
- Coordinate with other developers when the work you do and models you write can be shared with other operations.
- Coordinate with simulator creators about the parameters names and structures that can be passed from control.yaml file.
TODO: this section is no longer current and will need to be rewritten when a new implementation for R wrapping is made. report_volume no longer exists and the R codes are in metsi-forestry.
To run operations using the R functions you need R available in the local environment.
You must additionally install the Python rpy2
package for the necessary API.
For convenience, this can be installed via the optional-requirements.txt
.
pip install --user -r optional-requirements.txt
The r_utils.py module contains functions and examples of how to bind into R functions.
The r
directory houses related R script and data files.
An example implementation lmfor_volume
exists for forest stand volume calculation.
Currently, this can be taken into use with the report_volume
operation function in the control.yaml, following the
example below
operation_params:
report_volume:
lmfor_volume: true
The project contains a DESCRIPTION
file which must be used to declare R library dependencies for R scripts. This is
not necessary for running the R scripts locally, but is required for dependency resolution in the GitHub Actions test
runs pipeline. Local dependencies are handled by the script files as exemplified by the beginning of
the lmfor_volume.R
. It will install its dependencies on run if they are not found from the local environment R
libraries.
library_requirements <- c("lmfor")
if(!all(library_requirements %in% installed.packages()[, "Package"]))
install.packages(repos="https://cran.r-project.org", dependencies=TRUE, library_requirements)
library(lmfor)
To understand the partial tree strategy better, consider a simulator instruction such as:
simulation_events:
- time_points: [ 0,5 ]
generators:
- sequence:
- grow
- alternatives:
- do_nothing
- thinning
which will produce a simulator tree as below:
the full
strategy would create operation chains (one for each possible path in the tree) and run them independently
from one another. In this case, we would have four separate chains, each chain having four operations.
On the other hand, the partial
strategy would proceed as follows:
- create operation chains from the nodes in the first time point, and run them:
Here you'll notice that the first period's grow
operation is executed twice, whereas the full
tree strategy would
have executed it four times.
- For all successful results from the first period, create operation chains from the nodes in the second period:
Now, let's assume that the second chain from time point 0 would not complete successfully, e.g. due to a constraint set
on the thinning operation. At time point 5, The partial tree strategy would then only create operation chains
for output 1
, i.e. the two chains on the left in the above diagram. This logic reduces the unnecessary computation the
simulator has to make, compared to the full strategy, and this is true especially for large simulation trees.
"Partial" in the strategy name refers to the fact that the strategy creates trees from only the nodes (EventTree
s) in
one time point at a time (i.e. partial trees/subtrees) and traverses those partial trees (with the same post-order
traversal algorithm) to create partial (or sub-) chains of operations. Therefore, the strategy can be thought to
operate depth-first within time points, but breadth-first across time points.
Use the control.yaml structure declaration to control the simulation structure, domain operations and parameters that are to be used.