-
-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/reduce mem pathfinder #3325
base: develop
Are you sure you want to change the base?
Conversation
…ager and large chunks of memory
* @param[in] values Values in a std::vector | ||
* @param[in, out] ss ignored | ||
*/ | ||
void operator()(const std::vector<double>& values, std::stringstream& ss) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I like these overloads --
- They're not in the base class
writer
- Is there a reason we cant just have the buffer be a private member in the class or something similar?
- The "Note" in the doc comment is wrong, since we make no attempt to make sure the settings of the stringstream are the same as the ostream. I think, among other things, this would break the way cmdstan sets the number of digits?
Separately, isn't std::ostream
buffered internally anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we cant just have the buffer be a private member in the class or something similar?
I didn't want to introduce state directly in the class
The "Note" in the doc comment is wrong, since we make no attempt to make sure the settings of the stringstream are the same as the ostream. I think, among other things, this would break the way cmdstan sets the number of digits?
I totally missed this. Yeah my through was that having a little buffer we reuse would save some memory, but idt it's worth the overhead of config etc. And we only use it for the case of a std vector really
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
…pathfinder' into feature/reduce-mem-pathfinder
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
template <typename Writer> | ||
struct concurrent_writer { | ||
std::reference_wrapper<Writer> writer; | ||
std::reference_wrapper<std::mutex> mut_; | ||
explicit concurrent_writer(std::mutex& mut, Writer& writer) | ||
: writer(writer), mut_(mut) {} | ||
template <typename T> | ||
void operator()(T&& t) { | ||
std::lock_guard<std::mutex> lock(mut_.get()); | ||
writer.get()(t); | ||
} | ||
void operator()() { | ||
std::lock_guard<std::mutex> lock(mut_.get()); | ||
writer.get()(); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WardBrian do you think this should be in it's own file or is it fine to just put this here?
The other alternative to this writer is to make a writer on a seperate thread that busy waits to watche an mpsc queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not for general use it's probably fine to leave it here.
I haven't been following this PR that closely, though. Do you have numbers on this versus a queue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a simple benchmark I'm running here
https://github.com/SteveBronder/stan-perf/tree/parallel-writer
Preliminary results seem like mpsc queue on a seperate thread is a good thing. The plot is faceted by the number of threads and the lines are for mpsc vs. mutex and the amount of "work" each main thread has to do between sending results to the writer thread. "0" means no additional work is done and the working thread just generates a random vector and then pushes to writer. The "16" means the work thread is doing a loop that multiplies that random vector with itself 16 times. One thing that is nice about the mpsc queue is that since we are pushing the write to disk off to another thread the work thread can do more work while instead of waiting for the write to disk.
… ends do not get written once
…pathfinder' into feature/reduce-mem-pathfinder
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
Okay running it a bit more it looks like we definitely want the concurrent writer. The below is for the time to write to disk for mpsc vs mutex. The x axis is on the log scale but the y axis is not. The code is on the same branch as the last comment. Even with a small amount of loops there's a pretty huge difference in timing. tbh I wonder if we should even do this concurrent writer for nuts with parallel chains. Even though we are writing to different files in that case, we only really have one disk still that we are writing to. So just pushing the operations for converting the vectors to char and then writing them to disk seems pretty useful. |
Submission Checklist
closes #3323
closes #3322
./runTests.py src/test/unit
make cpplint
Summary
This removes the large memory footprint for writing to csv for pathfinder. So large models should be nicer to the system memory. This may decrease performance a bit, but the memory savings for large models will be significant so I think it is worth it.
When running single pathfinder from pathfinder we now just return the pathfinder at the best elbo estimates instead of generating all of the constrained parameters within single pathfinder. The constrained parameters are now generated on the fly before we send them to the parameter writer.
For pathfinder that does not do PSIS resampling we write to the parameter writer from within each single pathfinders thread. To avoid contention when writing these we use a
mutex
before doing a bulk write for each pathfinder instance. I thought about putting each pathfinders results to a seperate file and then combining them at the end. But at the end of the day I think this would still have the same problems as multiple writes to a single file. We only have one system and it can only handle so many writes at the same time. So idt the effect of splitting the writes to multiple files would be noticable for saving time. One possible idea would be to have a thread that is busy spinning while checking a multi producer single consumer queue of values to write to the parameter writer.Intended Effect
Have pathfinder use less memory when saving parameters to the writer
How to Verify
All previous tests pass
Side Effects
Reduce memory usage
Documentation
Added docs for function that generates parameters and writes them.
Copyright and Licensing
Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Simons Foundation
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses: