Refactor bookkeeping #105

loreloc · 2023-07-09T08:39:53Z

Refactor bookkeeping to work with folded layers having an arbitrarily number of inputs (which I call arity). In this way, any layer is treated in the same way, i.e., we have a nice layer-agnostic loop.

        for layer, (should_pad, in_layer_ids, fold_idx) in zip(self.inner_layers, self.bookkeeping):
            # (fold_1 + ... + fold_n, units, batch_size)
            if len(in_layer_ids) == 1:
                (in_layer_id,) = in_layer_ids
                inputs = layer_outputs[in_layer_id]
            else:
                inputs = torch.cat([layer_outputs[i] for i in in_layer_ids], dim=0)
            if should_pad:
                if isinstance(layer, SumProductLayer):  # type: ignore[misc]
                    pad_value = 0.0
                else:
                    pad_value = -np.inf
                inputs = F.pad(inputs, [0, 0, 0, 0, 1, 0], value=pad_value)
            # inputs: (fold, arity, units, batch_size)
            inputs = inputs[fold_idx]
            # outputs: (fold, units, batch_size)
            outputs = layer(inputs)

Highlighted features:

Unified input and output tensor layout: inputs: (fold, arity, units, batch_size), and outputs: (fold, units, batch_size).
No padding is applied if layers to be folded have the same number of inputs.
No concatenation operation is performed at all if there is no need to, i.e., if all the inputs of a layer can be found in a fold of another layer => a view is sufficient!
Removed legacy mixing layer's parameters mask. Eventually, padding will take care of masking parameters. However, the pad values should be chosen based on (i) the layer and (ii) the computation domain space, but for now they are hard-coded for computations in the log-space.

Note: I had to permute the einsum indices in the mixing layer (see #103) to make it compliant with the layouts above.

Performance is similar to after merging #103 into main, and it is probably the best achieved until now.
Time/Memory: ~0.104-0.110s / 5GiB

...
t/m: 106.83161926269531 5020.56884765625
t/m: 105.59600067138672 5020.56884765625
t/m: 107.38585662841797 5020.56884765625
t/m: 107.3458251953125 5020.56884765625
t/m: 110.02127838134766 5020.56884765625
t/m: 104.13385772705078 5020.56884765625
t/m: 107.49929809570312 5020.56884765625
t/m: 104.31513977050781 5020.56884765625
t/m: 104.40985870361328 5020.56884765625
train LL: 3833.79560546875

Fixes (but does not close) #101.
Closes #96.

lkct · 2023-07-09T13:03:57Z

please use rebase for a simpler git history: first commit upon main the revert of the revert. and then rebase what's left on the revert commit
what's the motivation of FCKB? this leads to non-contiguous tensors and is not good for mixing layer, whose performance is not yet benchmarked

PS: I would split the change of layer interface to one input and the refactoring of bookkeeping into 2 PRs, because the former is self-contained, and should be benchmarked separately.

PPS: If you're to switch to FC..., there should be one standalone commit for that, to enable further benchmarking. (I mean, first implement CF to be consistent with what's existing, and then use a commit to switch to FC to enable benchmark and compare)

Time/Memory: ~0.104-0.110s / 5GiB

better take an avg. and there's no need to paste all those lines.
and where is the gain? (or is there a gain?)

loreloc · 2023-07-09T14:09:32Z

I added an assert outputs.is_contiguous() to the snippet above and tests pass. Which tensor might not be contiguous exactly? I used the FC... layout instead of the CF... ones in order to unify the input shape format across layers.

For the benchmark of the mixing layer can we upload a non-structured-decomposable quad-tree region file and use that instead?

The performance gain is implicit from my point of view, in the sense that the code is more extendible without impacting the performance. Moreover, the two einsum expression in the CP layer can now be merged into one. I did not try to implement it yet, but it can only improve performances.

lkct · 2023-07-09T14:31:56Z

Which tensor might not be contiguous exactly?

See below

I used the FC... layout instead of the CF... ones in order to unify the input shape format across layers.

Yes we should unify, but I was asking: why not unify to CF...?

For the benchmark of the mixing layer can we upload a non-structured-decomposable quad-tree region file and use that instead?

We should. And we should add a benchmark runner to get all results at once. (similar to pyjuice benchmark). We should also go back to see what's the baseline for non-stdec.
For choices, we can have 28x28 PD or QG. Is 28x28 PD too large?

The performance gain is implicit from my point of view,

I just wonder if 0.104~0.110 and 0.106 are the same (just random fluctuations).

lkct

non-contiguous tensors

lkct · 2023-07-09T14:33:01Z

cirkit/layers/mixing.py


    def _forward_linear(self, x: Tensor) -> Tensor:
-        return torch.einsum("cfk,cfkb->fkb", self.params, x)
+        return torch.einsum("fck,fckb->fkb", self.params, x)


I was wrong about this

lkct · 2023-07-09T14:35:12Z

cirkit/layers/sum_product/cp.py

        :return: result of the left operations, in log-space.
        """
+        log_left, log_right = inputs[:, 0], inputs[:, 1]


these two are non-contiguous inputs for the following.

however seems no major performance impact for CP? yet we don't know if this will harm other layers, so if no performance difference, we should use contiguous.

arranger1044 · 2023-07-09T17:31:56Z

For choices, we can have 28x28 PD or QG. Is 28x28 PD too large?

We can use PD with delta [7, 28], which first splits the image up to regions with $7\times 7$ pixels, and then splits these into $1\times 1$ pixel regions. This is a good compromise.

arranger1044 · 2023-07-09T17:34:31Z

please use rebase for a simpler git history: first commit upon main the revert of the revert. and then rebase what's left on the revert commit

there should be one standalone commit for that, to enable further benchmarking

at this stage, let's be flexible with commits in PRs, otherwise se might add to much friction to development. When we have many more commits (and developers) we can come up with some good guidelines

lkct · 2023-07-09T20:42:51Z

at this stage, let's be flexible with commits in PRs, otherwise se might add to much friction to development.

That's why I'm just using "comment" instead of "request change". However I cannot be convinced that we should use FC instead of CF based on what's given.
Also, if I were to ask for a perfect PR, there could be more to change, e.g. I cannot see why the numbers in the tests should be changed, and the way it was changed can make the test invalid.

What I really want to say is that: the price will only be explicit when we need to debug and benchmark, and we may suffer a lot at that time. Yes, we can say that rigor should give way to dev speed in this PR, and we can even say this in each PR -- there's no clear boundary when we should start to prefer rigor and we can always put off.
On the other hand, it's important to form a better coding habit. It can be hard to rework an existing PR into a better shape, but it really takes little effort to keep a PR good from the first commit. I can only add review comments when a PR is opened, which means it can be too late to rework what's already there and will lead to another "let's be flexible" decision.
Please DO try to start from some better practice instead of being "flexible" when we want to merge.

arranger1044 · 2023-07-10T00:43:14Z

let's discuss in the next meeting what can be a good balance between rigor and dev speed

loreloc · 2023-07-11T09:54:52Z

I have run some benchmarks on this branch.
See #110 for the results on the current main branch.

For PCs without mixing layers it is basically the same.
It seems though it is slightly slower when using mixing layers (~7% for PD and ~3% for QT).
Probably this is the price to pay to have "non-specialized" tensor layouts. As we discussed, in the future it might be worth exploring how to compile layers with the "best" layout.

I believe it is ok performance-wise overall, considered we already gained ~2x recently thanks to #103. Let me know what you think.

PD 28x28 delta=7 and delta=5

Number of parameters: 103151628
Time (ms): 74.650+-1.225
Memory (MiB): 5005.899+-0.000

Number of parameters: 105730068
Time (ms): 132.045+-2.598
Memory (MiB): 5248.470+-0.733

QT 28x28 structured-decomposable and non-structured-decomposable

Number of parameters: 103962625
Time (ms): 108.840+-3.757
Memory (MiB): 5020.569+-0.000

Number of parameters: 105419780
Time (ms): 146.681+-1.638
Memory (MiB): 5043.930+-0.000

RBT 784 depth=4 repetitions=16

Number of parameters: 103120928
Time (ms): 67.594+-0.756
Memory (MiB): 5006.148+-0.000

lkct · 2023-07-11T11:22:49Z

(~7% for PD and ~3% for QT)

7% is already a lot in my opinion. In #103, ...BK and ...KB only have 10% difference.
I would say FC... and CF... can still make a difference.

however the performance requires deeper investigation and we can merge now with the current benchmark result and further investigate later, as communicated in zulip DM.

loreloc · 2023-07-11T11:30:35Z

Ok then, I think we can merge this. We should investigate the performances if we use CF... for both the CP layer and the mixing layer.

Before merging I will clean unused attributes.

loreloc added 6 commits July 7, 2023 22:38

new bookkeeping

edecc1e

probabilities now sum up to one

82d2a6b

fix linting errors

6b980db

prevent concatenating tensors if network is feed-forward

1dd0ef6

tests passing

7fb7359

Merge branch 'main' into refactor-bookkeeping

1589bac

loreloc added the enhancement New feature or request label Jul 9, 2023

fix comment

819572a

loreloc requested review from lkct and arranger1044 and removed request for lkct and arranger1044 July 9, 2023 09:47

lkct reviewed Jul 9, 2023

View reviewed changes

Merge branch 'main' into refactor-bookkeeping

7297caa

removed unused fold_count attribute from layers

1ed21c4

loreloc merged commit 16ace43 into main Jul 11, 2023

loreloc deleted the refactor-bookkeeping branch July 11, 2023 11:46

lkct assigned loreloc Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor bookkeeping #105

Refactor bookkeeping #105

loreloc commented Jul 9, 2023 •

edited

Loading

lkct commented Jul 9, 2023 •

edited

Loading

loreloc commented Jul 9, 2023 •

edited

Loading

lkct commented Jul 9, 2023

lkct left a comment

lkct Jul 9, 2023 •

edited

Loading

lkct Jul 9, 2023

arranger1044 commented Jul 9, 2023

arranger1044 commented Jul 9, 2023

lkct commented Jul 9, 2023

arranger1044 commented Jul 10, 2023

loreloc commented Jul 11, 2023 •

edited

Loading

lkct commented Jul 11, 2023

loreloc commented Jul 11, 2023 •

edited

Loading

Refactor bookkeeping #105

Refactor bookkeeping #105

Conversation

loreloc commented Jul 9, 2023 • edited Loading

lkct commented Jul 9, 2023 • edited Loading

loreloc commented Jul 9, 2023 • edited Loading

lkct commented Jul 9, 2023

lkct left a comment

Choose a reason for hiding this comment

lkct Jul 9, 2023 • edited Loading

Choose a reason for hiding this comment

lkct Jul 9, 2023

Choose a reason for hiding this comment

arranger1044 commented Jul 9, 2023

arranger1044 commented Jul 9, 2023

lkct commented Jul 9, 2023

arranger1044 commented Jul 10, 2023

loreloc commented Jul 11, 2023 • edited Loading

lkct commented Jul 11, 2023

loreloc commented Jul 11, 2023 • edited Loading

loreloc commented Jul 9, 2023 •

edited

Loading

lkct commented Jul 9, 2023 •

edited

Loading

loreloc commented Jul 9, 2023 •

edited

Loading

lkct Jul 9, 2023 •

edited

Loading

loreloc commented Jul 11, 2023 •

edited

Loading

loreloc commented Jul 11, 2023 •

edited

Loading