Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design to allow multiple versions of primitive VSA operators #73

Closed
rgayler opened this issue Jun 4, 2022 · 9 comments · Fixed by #91
Closed

Design to allow multiple versions of primitive VSA operators #73

rgayler opened this issue Jun 4, 2022 · 9 comments · Fixed by #91
Labels
enhancement New feature or request

Comments

@rgayler
Copy link

rgayler commented Jun 4, 2022

Allow for multiple, different versions of the primitive VSA operators (bind, bundle, permute, and maybe others).

This is needed, for example to introduce HRR binding, and randsel bundling (#75).

This is related to #72 and #25, which appear to imply dispatching on hypervector type. However, people will want to introduce and compare different operators applied to the same hypervectors, so this issue is orthogonal to hypervector type.

@mikeheddes mikeheddes added the enhancement New feature or request label Jun 7, 2022
@mikeheddes mikeheddes changed the title [enhancement] Design to allow multiple versions of primitive VSA operators Design to allow multiple versions of primitive VSA operators Jun 7, 2022
@mikeheddes
Copy link
Member

mikeheddes commented Jun 20, 2022

How about an API such as this (not an exhaustive list but just an illustration):

import torchhd

# binding operations
torchhd.multiply(a, b)  # works for all data types (implements XOR for boolean tensors)
torchhd.convolution(a, b)
...

# bundling operations
torchhd.sum(a, b) 
torchhd.mean(a, b) # normalizes the result of sum
torchhd.randsel(a, b)
...

With this design our current torchhd.bind and torchhd.bundle will simply be replaced with torchhd.multiply and torchhd.sum.

We should add sections in the documentation to group the different binding and bundling implementations so it's clear for people what they are meant to do.

@rgayler
Copy link
Author

rgayler commented Jun 21, 2022

Sorry - I am so totally not a pythonista that I can't tell what that does.

If it's any help, my motivation for the request is:

  1. The space of VSA primitive operations is broadly indexed by hypervector type and operator algorithms (which are roughly orthogonal).
  2. Some VSA researchers will want to compare the same abstract VSA circuit across differing definitions of VSA primitives.
  3. Researchers doing that would generally want consistent VSA primitive definitions throught the abstract VSA circuit (i.e. not using multiple different implementation in the one VSA circuit).
  4. Ideally, they would define the abstract VSA circuit once and then be able to specifiy the different VSA primitive definitions without rewriting the circuit code (thus avoiding a possible source of error).

This is also the motivation for #79. If the encoding functions are defined in terms of abstract bind and bundle then the impact of different instantiations of bind and bundle can be investigated.

@mikeheddes
Copy link
Member

In my comment above I tried to present a sketch of the functions that Torchhd could expose/provide to allow users to switch between multiple methods for the primitive operations. So besides the current default (sum for bundling and product for binding) we would also expose circular convolution for binding and randsel for bundling, among many other possible methods.

I think to make this design very ergonomic for the user we should still expose generic bind, bundle and permute functions but allow users to override them. For this API design I was thinking we can take inspiration from multiprocessing.set_start_method.

My idea is a design as follows:

import torchhd

A, B = torchhd.random_hv(2, 10000)

torchhd.bundle(A, B)  # addition by default
torchhd.bind(A, B)  # product by default

# here comes the new part
torchhd.set_bundle_method(torchhd.randsel)
# torchhd.set_bind_method(...)
# torchhd.set_permute_method(...)
# from now on any call to torchhd.bundle will execute randsel
torchhd.bundle(A, B)
# you could equivalently do
torchhd.randsel(A, B)

A benefit of this design is that our data structures such as hash tables and graphs internally use the bundle and bind functions which means that their behavior will automatically update when torchhd.set_bundle_method is called. Another benefit is that users could specify their own bundling function which takes two hypervector tensors and produces a third and can set torchhd to use this in all the code, like so:

import torchhd

def my_custom_bundle(a, b):
    return a + b

torchhd.set_bundle_method(my_custom_bundle)

A, B = torchhd.random_hv(2, 10000)
torchhd.bundle(A, B)  # uses my_custom_bundle

I think this could be very useful for research purposes.

By default the code as users write it with the current version stays the same but more flexibility becomes available because the primitives can be changed.

@rgayler could you perhaps help me compile or point to an overview of all the various versions of bind, bundle and permute that we should implement? I am looking for general descriptions of the methods such that we can implement them for the broadest range of hypervector types. For example, multiply binding is equivalent to XOR for binary hypervectors so I would like to keep these as one function whose execution depends on the hypervector type, instead of providing both torchhd.multiply and torchhd.xor.

@mikeheddes
Copy link
Member

One potential problem I thought of it that currently we also provide multibundle and multibind which are more efficient implementations when bundling or binding many hypervectors. Although they are mathematically equivalent to a for loop over bundle/bind the speed up of performing this in a single call can be pretty significant.

One way to combat this is to allow the user to override both the single and multi version of each primitive. When a user only provides the implementation for a single we can fallback to a for loop in the multi-methods implementation. And for the build-in methods the user could specify them as a string so that we can set the single and multi versions correctly behind the scene. Here are some examples that will hopefully make it more clear:

import torchhd, torch

# assuming we have randsel implemented in the library 
torchhd.set_bundle_method("randsel")

hypervectors = torchhd.random_hv(2, 10000)
A, B = hypervectors

torchhd.bundle(A, B)  # randsel 
torchhd.functional.multibundle(hypervectors)  # efficient multi-randsel

def my_custom_bundle(a, b):
    return torch.add(a, b)

# user only specifies custom single bundle method
torchhd.set_bundle_method(my_custom_bundle)
torchhd.bundle(A, B)  # uses  torch.add
torchhd.functional.multibundle(hypervectors)  # will use a for-loop with torch.add

def my_custom_multibundle(a):
    return torch.sum(a, dim=-2)

# user specifies both a custom single and multi bundle method
torchhd.set_bundle_method(my_custom_bundle, my_custom_multibundle)
torchhd.bundle(A, B)  # uses  torch.add
torchhd.functional.multibundle(hypervectors)  # uses torch.sum

@mikeheddes
Copy link
Member

I'm working in the feat-primitive-override branch and coded the basic functionality to be able to override the bind, multibind, bundle, multibundle, and permute methods. I still have to make it more robust, figure out how inverse bind fits into this design and update documentation and tests.

@rgayler
Copy link
Author

rgayler commented Aug 28, 2022

could you perhaps help me compile or point to an overview of all the various versions of bind, bundle and permute that we should implement? I am looking for general descriptions of the methods such that we can implement them for the broadest range of hypervector types. For example, multiply binding is equivalent to XOR for binary hypervectors so I would like to keep these as one function whose execution depends on the hypervector type

@mikeheddes there are a few survey papers around (I'll get pointers to the ones I know) that list multiple instantiations of the generic operators. From memory, these don't make a point of highlighting which instantiations are special cases of others. FWIW binding with XOR and bipolar multiplicative binding can both be seen as equivalent to complex multiplicative binding where pahse angle has been quantised to two levels.

I understand why it's attractive to minimise the number of function names and dispatch on argument type, but if it turns out that's too hard I don't think it would be a tragedy to have a bunch of related instantioations of primitive operators and just note the relationships in the documentation.

@rgayler
Copy link
Author

rgayler commented Aug 28, 2022

These are the obvious recent surveys:
https://doi.org/10.1145/3538531
https://doi.org/10.1007/s10462-021-10110-3
But you already know about them because you've cited them in the TorchHD paper.

@mikeheddes
Copy link
Member

@rgayler thank you for the papers, I wanted to make sure there are no important ones that we were not aware of yet. I will try to make a table where I group special cases of methods. Then we can refine that before we implement them.

@mikeheddes
Copy link
Member

I can foresee that once we have a broader range of VSA/HDC models implemented we could add a method like set_model_primitives("MAP") which would internally call set_bundle_method, set_bind_method etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants