feat: Multithreading improvements #182

aborgna-q · 2023-10-10T14:37:46Z

The main feature of this PR is reducing the memory usage of multithreading with a priority channel by adding a shared Arc<RwLock<Option<P>>> with the maximum circuit cost in the queue, so the workers don't fill the channels with useless data.

The main improvement of this is limiting the memory usage when multiple threads are involved. Now I can run barenco_tof_10 on 10 threads with ~4GB of RAM, where before it exceeded 10GB (and caused a OOM).

I also included multiple small changes extracted from #175;

CircuitCost::as_usize instead of proxy operations, so we can share cost limits as atomics and run other primitive operations directly.
Print the actual number of circuits processed on the log (not only the seen count, which includes elements in the queue).
Replace eprintln!s with log calls.
Print the max queue capacity at the beginning of the execution.

lmondada · 2023-10-10T14:52:43Z

src/optimiser/taso/hugr_pchannel.rs

+        self.circ_cnt += 1;
+        self.log
+            .send(PriorityChannelLog::CircuitCount {
+                processed_count: self.circ_cnt,
+                seen_count: self.seen_hashes.len(),
+                queue_length: self.pq.len(),
+            })
+            .unwrap();


Is it on purpose that you are sending a log at every circ_cnt increment?

This generally doesn't happen that frequently. I guess I could add a timeout.

lmondada · 2023-10-10T14:55:19Z

src/optimiser/taso/hugr_pqueue.rs

+pub struct Entry<C, P, H> {
+    pub circ: C,
+    pub cost: P,
+    pub hash: H,
 }


Could/should this type be the same as type Work in hugr_pchannel?

lmondada · 2023-10-10T14:57:16Z

src/optimiser/taso/log.rs

        workqueue_len: Option<usize>,
        seen_hashes: usize,
    ) {
-        if circ_cnt % 1000 == 0 {
-            self.progress(format!("{circ_cnt} circuits..."));
+        if circuits_processed > self.last_circ_processed && circuits_processed % 1000 == 0 {


I think this can be if circuits_processed >= self.last_circ_processed + 1000

lmondada · 2023-10-10T14:58:22Z

src/optimiser/taso/worker.rs

+/// A unit of work for a worker, consisting of a circuit to process, along its
+/// hash and cost.
+pub type Work<P> = (P, u64, Hugr);


You've defined this twice.

feat: Avoid sending costly circuits to the priority queue

9d04bb3

aborgna-q requested a review from lmondada October 10, 2023 14:37

lmondada approved these changes Oct 10, 2023

View reviewed changes

aborgna-q added 3 commits October 10, 2023 16:08

feat: Avoid sending costly circuits to the priority queue

4de10a1

Progress logging timeout

622e686

Use Entry as the TASO work unit

91b3410

aborgna-q enabled auto-merge October 10, 2023 15:23

aborgna-q added this pull request to the merge queue Oct 10, 2023

Merged via the queue into main with commit 6f65beb Oct 10, 2023
7 checks passed

aborgna-q deleted the feat/taso-multithreding-improvements branch October 10, 2023 15:33

This was referenced Oct 11, 2023

feat: TASO worker sharding #175

Closed

TASO multithreading idea: Sharding #128

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Multithreading improvements #182

feat: Multithreading improvements #182

aborgna-q commented Oct 10, 2023 •

edited

Loading

lmondada Oct 10, 2023

aborgna-q Oct 10, 2023

lmondada Oct 10, 2023

lmondada Oct 10, 2023

lmondada Oct 10, 2023

feat: Multithreading improvements #182

feat: Multithreading improvements #182

Conversation

aborgna-q commented Oct 10, 2023 • edited Loading

lmondada Oct 10, 2023

Choose a reason for hiding this comment

aborgna-q Oct 10, 2023

Choose a reason for hiding this comment

lmondada Oct 10, 2023

Choose a reason for hiding this comment

lmondada Oct 10, 2023

Choose a reason for hiding this comment

lmondada Oct 10, 2023

Choose a reason for hiding this comment

aborgna-q commented Oct 10, 2023 •

edited

Loading