Skip to content

Memory Use

esseff edited this page Jun 19, 2022 · 26 revisions

Home > Model Development Topics > Memory Use

This topic is a compendium of information about memory use, from the general to the specific.

Related topics

Topic contents

Introduction and Background

Content to follow.

[back to topic contents]

Trade-offs in model architecture

Content to follow.

[back to topic contents]

Bag of Tricks

This subtopic contains links to sections which describe techniques which can reduce memory use. It is not alwyas appropriate to apply these techniques. It may not be worth the effort or additional model complecxity, or there may be a trade-off which is not worth making. The Entity Member Packing report can help identify which techniques will be fruitful.

Candidates for the BoT:

  • compute rather than store
  • use smaller c types, or range and classification
  • hook to self-scheduling events, e.g. self_scheduling_int(age)
  • be economical with events
  • be economical with tables (use tables_retain routinely, repeat a run to probe with detailed tables).
  • avoid ordinal statistics
  • use a unitary Ticker actor to push common characteristics to the population, e.g. year

[back to topic contents]

Exploit the resource use report

Content to follow.

[back to BoT]
[back to topic contents]

Suppress table groups

Content to follow.

[back to BoT]
[back to topic contents]

Change time type to float

The Time type of a model can be changed from the default double to float by inserting the following statement in model code:

time_type float;

The Time type is ubiquitous in models. It is used in attributes, events, and internal entity members. By default, Time wraps the C++ type double, which is a double-precision floating point number stored in 8 bytes. The time_type statement allows changing the wrapped type to float, which is stored in 4 bytes. This can reduce memory use. For example, here is the summary report for the 1 million GMM run used to illustrate the Model Resource Use topic where time_type is double (the default value):

+---------------------------+
| Resource Use Summary      |
+-------------------+-------+
| Category          |    MB |
+-------------------+-------+
| Entities          |  1924 |
|   Doppelganger    |   552 |
|   Person          |  1372 |
|   Ticker          |     0 |
| Multilinks        |    10 |
| Events            |    80 |
| Sets              |   196 |
| Tables            |     0 |
+-------------------+-------+
| All               |  2213 |
+-------------------+-------+

Here is the report for the same 1 million run with time_type set to float:

+---------------------------+
| Resource Use Summary      |
+-------------------+-------+
| Category          |    MB |
+-------------------+-------+
| Entities          |  1654 |
|   Doppelganger    |   515 |
|   Person          |  1138 |
|   Ticker          |     0 |
| Multilinks        |    10 |
| Events            |    80 |
| Sets              |   196 |
| Tables            |     0 |
+-------------------+-------+
| All               |  1943 |
+-------------------+-------+

Memory usage of the Person entity was 17% smaller.

A float has a precision of about 7 digits. That means that 2025.123 will be different from 2025.124, a precision of ~8 hours.

Changing time_type to float may affect model results due to the reduced precision of Time values. However, such differences may be purely statistical.

[back to BoT]
[back to topic contents]

Use value_out with flash tables

Flash tables are entity tables which tabulate at instantaneous points in time. They do that using an attribute like trigger_changes(year) in the table filter which becomes instantaneously true and then immediately false in a subsequent synthetic event. Because an increment to a flash table is instantaneous it has identical 'in' and 'out' values. That means that a flash table using 'value_in' will produce the same values as 'value_out'. However, value_in in a table causes the compiler to create an additional member in the entity to hold the 'in' value of an increment. For flash tables, this memory cost can be avoided by using 'value_out' instead of 'value_in'.

[back to BoT]
[back to topic contents]

Enable entity packing

Members of entities can be packed more efficiently by turning on the entity_member_packing option, but there is a trade-off. For mode information see Entity Member Packing.

[back to BoT]
[back to topic contents]

Use mutable real type

Floating point values can be declared in model code using the real type. By default, real is the same as the C++ type double, but it can be changed to the C++ type float by inserting the following statement in model code:

real_type float;

This single statement will change all uses of real from double to float, which will halve the storage requirements of `real' values.

A float has a precision of around 7 digits, so can represent a dollar amount of 12345.67 to an accuracy of 1 cent.

Because a single real_type statement changes the meaning of real throughout, it is easy to assess to what extent changing real from double to float affects results. This provides more flexibility than changing (say) double to float in code.

[back to BoT]
[back to topic contents]

Prefer range and classification to int

Values of type Range or Classification are automatically stored in the smallest C type which can represent all valid values. This can reduce memory use. For example, if YEAR is declared as

range YEAR  //EN Year
{
    0, 200
};

a member year

entity Person {
    YEAR year;
};

declared with type YEAR will be stored efficiently in a single byte. In contrast, of year was declared as int it would require 4 bytes.

[back to BoT]
[back to topic contents]

Use bitset instead of bool array

The bool type takes one byte of storage, even though a byte contains 8 bits. Some models use large arrays of bool in entity members, e.g.

entity Person {
    bool was_primary_caregiver[56][12];
}

which records whether a Person was a primary caregiver in each month of 56 possible working years during the lifetime. The Model Resource Use report would show that the was_primary_caregiver array member of Person consumed 672 bytes of memory in each Person, a significant amount for a time-based model with a large population.

The same information could be stored in a foreign member of Person using the C++ standard type std::bitset. A code sketch follows:

typedef std::bitset<56*12> ym_bools; // flattened bit array of 56 years and 12 months
...
entity Person {
    ym_bools was_primary_caregiver;
}

size_t ym_index(std::size_t year, std::size_t month) {
    return 12 * year + month;
}

Then model code like

ptCareGiver->was_primary_caregiver[nEarningIndex][nM] = true;

could be replaced by functionally equivalent code

ptCareGiver->was_primary_caregiver[ym_index(nEarningIndex,nM)] = true;

In this example, replacing the array of bool by a std::bitset reduces storage requirements from 672 bytes to 84 bytes, a significant saving for each Person entity.

If the bool array was 1-dimensional rather than 2-dimensional as in the example above, the code would be simpler.

Possibly, a general wrapper class bitset2D could be added to OpenM++ runtime support to avoid changing model code at all, e.g.

#include "bitset2D.h"
...
typedef bitset2D<56,12> ym_bools; // 2-D bit array of 56 years and 12 months
...
entity Person {
    ym_bools was_primary_caregiver;
}

Then existing model code like

ptCareGiver->was_primary_caregiver[ym_index(nEarningIndex,nM)] = true;

would require no changes.

[back to BoT]
[back to topic contents]

Purge available entity list

Depending on the model design, an entity type might be used only at a particular phase of the simulation. For example, an Observation entity might only be used during the creation of the starting population. OpenM++ maintains pools of entities which have exited the simulation which are available for potential reuse. If there will be no reuse of an entity type, the corresponding pool can be emptied and the memory reclaimed by a function call like

Observation::free_available();

[back to BoT]
[back to topic contents]

Home

Getting Started

Model development in OpenM++

Using OpenM++

Model Development Topics

OpenM++ web-service: API and cloud setup

Using OpenM++ from Python and R

Docker

OpenM++ Development

OpenM++ Design, Roadmap and Status

OpenM++ web-service API

GET Model Metadata

GET Model Extras

GET Model Run results metadata

GET Model Workset metadata: set of input parameters

Read Parameters, Output Tables or Microdata values

GET Parameters, Output Tables or Microdata values

GET Parameters, Output Tables or Microdata as CSV

GET Modeling Task metadata and task run history

Update Model Profile: set of key-value options

Update Model Workset: set of input parameters

Update Model Runs

Update Modeling Tasks

Run Models: run models and monitor progress

Download model, model run results or input parameters

Upload model runs or worksets (input scenarios)

Download and upload user files

User: manage user settings

Model run jobs and service state

Administrative: manage web-service state

Clone this wiki locally