Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running $2.1\cdot 10^9$ simple cells blows the memory #1969

Open
thorstenhater opened this issue Sep 5, 2022 · 0 comments
Open

Running $2.1\cdot 10^9$ simple cells blows the memory #1969

thorstenhater opened this issue Sep 5, 2022 · 0 comments
Assignees
Labels
bug hpc Relevant primarily to HPC environments.

Comments

@thorstenhater
Copy link
Contributor

thorstenhater commented Sep 5, 2022

During a set of benchmarks in the last week I had the following occur

  • being murdered by the OOM killer
  • a vector cannot be (re)alloc'ed with exceeding v.max_size()

both happened on current master as of Sept 22 on GPU enabled libarbor.
Example running was ring.cpp from the busyring engine in NSuite

  • GPU: cuda
  • 2.1 Billion cells of the complex variety
  • No spikes/samples stored

Investigation

Source Labels

Further experiments needed, but I suspect the spike source label map consolidation
during simulation setup. It relies on mpi::gather_all which builds a vector of all items
on all tasks. Each item is a (gid_t gid, std::string label, lid_t lo, lid_t hi) and in our example
each cell has one detector called 'detector' thus

  • gid: 8B
  • label: 8B
  • lo: 4B
  • hi: 4B

and in total $24\times 2.1 GB \simeq 48GB$ at the very minimum, ignoring all extra infrastructure
and overhead for

  • std::string: likely a pointer and capacity size_t, maybe less depending on SSO
  • std::unordered_map<gid, std::unordered_map<std::string, range>>, pointers, capacities, ...
  • std::vector up to $\times 1.5 - 2$ depending on the implementation.
  • padding/alignment of all data structures
@thorstenhater thorstenhater added bug hpc Relevant primarily to HPC environments. labels Sep 5, 2022
@thorstenhater thorstenhater self-assigned this Sep 5, 2022
@thorstenhater thorstenhater changed the title Running $$2.1\cdot 10^9$$ simple cells blows the memory Running $2.1\cdot 10^9$ simple cells blows the memory Sep 5, 2022
thorstenhater added a commit that referenced this issue Nov 29, 2023
Use the simple but well-known FNV-1a hash function to map
`cell_tag_type` aka `std::string` to an `uint64_t`
for label resolution. The former type has a size of 32B or more and the
latter 8B, thus cutting the storage and bandwidth
requirements by 3/4. 

The hash function is implemented from the reference given on the
authors' website/wikipedia and is extremely
simple. If we ever experience issues, we might consider switching this
to something of higher quality via an
external library, candidates are `xxHASH` and `Murmur3`.

https://github.com/Cyan4973/xxHash

Note: This should further relieve the memory pressure on larger scale
simulation as formulated in #1969 and make
#2005 less urgent.

There is no performance impact (at laptop scale), but the memory savings
are worth it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug hpc Relevant primarily to HPC environments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant