-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor bookkeeping #105
Refactor bookkeeping #105
Conversation
PS: I would split the change of layer interface to one input and the refactoring of bookkeeping into 2 PRs, because the former is self-contained, and should be benchmarked separately. PPS: If you're to switch to
better take an avg. and there's no need to paste all those lines. |
I added an For the benchmark of the mixing layer can we upload a non-structured-decomposable quad-tree region file and use that instead? The performance gain is implicit from my point of view, in the sense that the code is more extendible without impacting the performance. Moreover, the two einsum expression in the CP layer can now be merged into one. I did not try to implement it yet, but it can only improve performances. |
See below
Yes we should unify, but I was asking: why not unify to
We should. And we should add a benchmark runner to get all results at once. (similar to pyjuice benchmark). We should also go back to see what's the baseline for non-stdec.
I just wonder if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-contiguous tensors
|
||
def _forward_linear(self, x: Tensor) -> Tensor: | ||
return torch.einsum("cfk,cfkb->fkb", self.params, x) | ||
return torch.einsum("fck,fckb->fkb", self.params, x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wrong about this
:return: result of the left operations, in log-space. | ||
""" | ||
log_left, log_right = inputs[:, 0], inputs[:, 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these two are non-contiguous inputs for the following.
however seems no major performance impact for CP? yet we don't know if this will harm other layers, so if no performance difference, we should use contiguous.
We can use PD with delta |
at this stage, let's be flexible with commits in PRs, otherwise se might add to much friction to development. When we have many more commits (and developers) we can come up with some good guidelines |
That's why I'm just using "comment" instead of "request change". However I cannot be convinced that we should use What I really want to say is that: the price will only be explicit when we need to debug and benchmark, and we may suffer a lot at that time. Yes, we can say that rigor should give way to dev speed in this PR, and we can even say this in each PR -- there's no clear boundary when we should start to prefer rigor and we can always put off. |
let's discuss in the next meeting what can be a good balance between rigor and dev speed |
I have run some benchmarks on this branch. For PCs without mixing layers it is basically the same. I believe it is ok performance-wise overall, considered we already gained ~2x recently thanks to #103. Let me know what you think.
|
7% is already a lot in my opinion. In #103, however the performance requires deeper investigation and we can merge now with the current benchmark result and further investigate later, as communicated in zulip DM. |
Ok then, I think we can merge this. We should investigate the performances if we use Before merging I will clean unused attributes. |
Highlighted features:
Note: I had to permute the
einsum
indices in the mixing layer (see #103) to make it compliant with the layouts above.Performance is similar to after merging #103 into main, and it is probably the best achieved until now.
Time/Memory: ~0.104-0.110s / 5GiB
Fixes (but does not close) #101.
Closes #96.