-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add future work documentation #87
Add future work documentation #87
Conversation
@@ -1,3 +1,5 @@ | |||
--- | |||
title: Use Models with Less Points |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hate to be that guy, but "Fewer" is also an option here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha fair point, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, I'm excited to get to work on these!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Thanks a lot for outlining all those :)
I left a bunch of detailed comments, some are definitely open to discussion. I just thought this would be a good place to have those discussions and nail down what exactly we mean by these items.
Overall, reading all these made me excited to jump into research again!
@@ -1,4 +1,9 @@ | |||
--- | |||
title: Add Associative Connections |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this one I was actually thinking of associative connections like between the vision model of a car and the sound a car makes, and the word "car" etc.. I was thinking these would be analogous to lateral voting connections. What you describe here would go under "Add Top-Down Connections".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok that makes sense, I was confused by the potential duplication (I think because I focused on the term "hierarchy" in the cmp-hierarchy
grouping).
With that cleared up, I wonder if it's a bit of a duplication of "Generalize Voting to Associative Connections" --> my temptation would be to keep that one, and add the point that this should enable associating e.g. sound objects with physical objects (i.e. where their models may not both be 3D), and get rid of "Add Associative Connections" under cmp-hierarchy. What do you think @vkakerbeck ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow yes! That only now clicked for me that those two are basically the same. Its kind of cool that we can solve both these with the same solution. I think I had added this one under hierarchy because the first time I thought about these was in the context of modeling language and grounding it in physical models of objects. But I think we should just remove this one and expand on the one under voting like you suggest. Maybe add the "abstract" or "num_steps" label to it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok nice, yeah sounds good!
@@ -1,3 +1,5 @@ | |||
--- | |||
title: Add Top-Down Connections | |||
--- | |||
|
|||
One of the main roles of top-down connections is the associative recall and prediction outlined in [Associative Connections](add-associative-connections.md). However, top-down projections can also support decomposing goal-states into specific sub-goals, as discussed in [Decomposing Goal States](../motor-system-improvements/decompose-goals-into-subgoals-communicate.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to the comment above, I would use the description you wrote out for the previous topic here. I wouldn't think of the goal states as the top-down connections. Those belong in the motor section, specifically "Decompose Goals into Subgoals & Communicate"
|
||
As we introduce hierarchy and leverage more unsupervised learning, representations will emerge at different levels of the system that may not correspond to any labels present in our datasets. For example, handles, or the head of a spoon, may emerge as object-representations in low-level LMs, even though the dataset only recognizes labels like "mug" and "spoon". | ||
|
||
One approach to measure the "correctness" of representations in this setting might be how well a predicted representation aligns with the outside world. For example, while LMs are not designed to be used as generative models, we could visualize how well an inferred object graph maps onto the object actually present in the world. Quantifying such alignment might leverage measures such as differences in point-clouds. This would provide some evidence of how well the learned decomposition of objects corresponds to the actual objects present in the world. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we actually need to measure this. If we model and recognize compositional objects I would assume that just the outputs of the highest level LMs would be enough to judge how well the system does on those compositional datasets. Maybe we would want to measure additional things like number of graphs learned at lower levels etc (which we already do). We can leave it here as an additional suggestion but I think when we start taking a crack at the compositional dataset this wouldn't be the first thing I would start with.
Another point would be that in our compositional dataset we know what the sub-objects are (forks, knives, spoons,...) and we know the compositional objects (set dinner table,...) Somehow we want to system to learn these. That's what I meant with "Figure out supervision" So for instance, should we show the sub objects first and give labels for those to all LMs, then show the compositional scenes and give labels to all? What is the desired outcome? Do we want lower-level LMs to learn rough models of the scenes? Do we want higher-level LMs to learn models of the cutlery as well? I would add a lot more around that in this section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I've added those items to the start.
@@ -1,3 +1,11 @@ | |||
--- | |||
title: Send Similarity Encoding Object ID to Next Level & Test | |||
--- | |||
|
|||
We have implemented the ability to encode object IDs using sparse-distributed representations (SDRs), and in particular can use this as a way of capturing similarity and disimlarity between objects. Using such encodings in learned [Associative Connections](add-associative-connections.md), we should observe a degree of natural generalization when recognizing compositional objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we are interpreting the term "associative connections" in Monty the same way. When I wrote that I meant associations between object IDs that coocur (basically voting), not hierarchical connections. Since those are spatially a lot more constrained I wouldn't think of them the same way. Why would we need learned associative connections to see the effect of similarity encodings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I've changed this to Hierarchical Connections, per the earlier discussion.
|
||
For example, assume a Monty system learns a dinner table setting with normal cuttlery and plates. Separately, the system learns about medieval instances of cuttlery and plates, but never sees them arranged in a dinner table setting. Based on the similarity of the medieval cutterly objects to their modern counterparts, the objects should have considerable overlap in their SDR encodings. | ||
|
||
If the system was to then see a medieval dinner table setting for the first time, it should be able to recognize the arrangement as a dinner-table setting with reasonable confidence, even if the constituent objects are somewhat different from those present when the compositional object was first learned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be nice to include images of these two scenes here for better visualization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Adding
@@ -1,3 +1,9 @@ | |||
--- | |||
title: Detect Local and Global Flow | |||
--- | |||
|
|||
Our general view is that there are two sources of flow processed by cortical columns. A larger receptive field sensor helps to estimate global flow, where flow here will be particularly pronounced if the whole object is moving, or the sensor itself is moving. A small receptive-field sensor patch corresponds to the channel by which the primary sensory features (e.g. point-normal, color) arrive. If flow is detected here, but not in the more global channel, then it is likely that just part of the object is moving. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make the distinction more clear:
local flow - object is moving
global flow - sensor is moving
We should also mention that these may not be detectable with the same sensor (small patch can't distinguish between object and sensor movement since for the patch both of it would be global flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok yeah that's what I was trying to get at (the uncertainty depending on the size), will try rewording it.
|
||
In the short term, we would like to extract richer features, such as using HTM's spatial-pooler or Local Binary Patterns for visual features, or processing depth information within a patch to approximate tactile texture. | ||
|
||
In the longer-term, given the "sub-cortical" nature of this sensory processing, we might also consider neural-network based feature extraction, such as shallow convolutional neural networks, however please see [our FAQ on why Monty does not currently use deep learning](../../how-monty-works/faq-monty.md#why-does-monty-not-make-use-of-deep-learning). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be worth mentioning that extracted features should be rotation invariant. So if we look at the same location on an object from different angles, the extracted feature should be the same. This is not a given with neural networks or many other approaches.
|
||
However, a more general formulation might be to use displacements as the core spatial information in the CMP, such that a specific location (in body-centric coordinates or otherwise) is never communicated outside of an LM or sensor module. | ||
|
||
Such an approach might align well with adding information about flow (see [Detect Local and Global Flow](../sensor-module-improvements/detect-local-and-global-flow.md)), modeling moving objects (see [Deal With Moving Objects](../learning-module-improvements/deal-with-moving-objects.md)), and supporting abstract movements like the transition from grandchild to grandparent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach will be very tricky since if we don't get the location of the sensor relative to the body it is almost impossible to output anything in a common reference frame and therefor vote or send other outputs. We could do basic voting on object ID and rely on colocation of receptive fields in the hierarchy but also motor commands will be pretty much impossible this way. We could mention that this could allude to the difference between the where and what pathway. But also that this is not something we plan to implement but merely a possibility we want to investigate further.
Currently, voting relies on all learning modules sharing the same object ID for any given object, as a form of supervised learning signal. Thanks to this, they can vote on this particular ID when communicating with one-another. | ||
|
||
However, in the setting of unsupervised learning, the object ID that is associated with any given model is unique to the parent LM. As such, we need to organically learn the mapping between the object IDs that occur together across different LMs, such that voting can function without any supervised learning signal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also mention that the brain has to do the same thing. It could not use a globally consistent SDR representation for each object. The neurons just do associative learning and a cortical column has no idea what the incoming spikes mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wish I could give a 👍 like we are requesting features, haha. 🤣 This one would be very useful!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much Niels! I'm really amazed how much you have added explanations to all future directions, it's mind boggling. 🤯
LGTM on PR. Will the next step by trying to prioritize these somehow?
We have all of them listed out here https://docs.google.com/spreadsheets/d/10b0FR9YdFYqfhIiGMpZsjmN2OAbNAjp4m_hLBCV161I/edit?gid=0#gid=0 and some are already grouped into our next milestones (below the table). We will likely create 1-2 new, intermediate milestones to prepare for the Heterarchy pt. 2 milestone (and eventually hierarchical goal policies). |
…bservations.md Co-authored-by: vclay <[email protected]>
…bservations.md Co-authored-by: vclay <[email protected]>
Co-authored-by: vclay <[email protected]>
…-saccades-driven-by-model-free-and-model-based-signals.md Co-authored-by: vclay <[email protected]>
…' into Update-future-work-documentation
Thanks for the helpful comments @vkakerbeck and @hlee9212 ! Those should all be addressed now but let me know if there are any further changes you want. I'll also now double check the overview spreadsheet and make sure all the cells are updated to match here. |
@vkakerbeck I've also updated some of the hashtags where I felt some were missing. |
Lastly I've gone through and made sure the names for future-work sections are consistent across the individual articles, the header .md files, and the overview sheet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for the updates! Just two minor comments but overall looks great!
In Monty systems, low-level LMs project to high-level LMs, where this projection occurs if their sensory receptive fields are co-aligned. Hierarchical connections should be able to learn a mapping between objects represented at these low-level LMs, and objects represented in the high-level LMs that frequently co-occur. Such learning would be similar to that required for [Generalizing Voting To Associative Connections](../voting-improvements/generalize-voting-to-associative-connections.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't compare it to the associative voting connections. The voting connections are between LMs that are modeling the same object. The hierarchical connections are between LMs that model a compositional object and its sub-components. So in the bottom-up connection the lower LMs object ID becomes a feature in the higher level LM. In the top down connection, we associate a feature in the higher level LM with an object in a lower level LM (+ a location). I understand where you are coming from in the sense that this is also just a learned association but maybe its still a bit confusing to read for someone who doesn't have all this background. Maybe we could just remove the reference to the voting plan since it won't be the exact same mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah fair point, thanks
Such an approach might align well with adding information about flow (see [Detect Local and Global Flow](../sensor-module-improvements/detect-local-and-global-flow.md)), modeling moving objects (see [Deal With Moving Objects](../learning-module-improvements/deal-with-moving-objects.md)), and supporting abstract movements like the transition from grandchild to grandparent. It would also result in a reformulation of "goal-states" to "goal-displacements". | ||
|
||
Note that whatever approach is taken, we would need to have information about the location or displacement of the sensor to ensure that communicated displacements are still in a shared coordinate system. This may relate to the division of "what" and "where" pathways in the brain, although this is not yet clear and requires further investigation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how displacements could be in a shared coordinate system. If we only communicate displacements, there is no way of knowing where the movements are relative to each other. I still think some information about location in a common reference frame needs to be present. maybe like you mention the difference between where and what pathways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this work as long as the rotation and scale is in some shared coordinate system (e.g. body-centric)? In the sense that we don't really care about a location, we care about a displacement, but we need to know what direction it's pointing (rotation), and by how much (scaling, if relevant).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah for a single LM recognizing objects that works but I don't see how that works when multiple LMs start communicating or when we want to send goal states. How would you do voting or goal states without a shared reference frame of where sensors are relative to some common reference point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah that's a good point, now that I think about it more, it does feel like location is required for to some degree, I've updated the wording.
Adds descriptions for many of the outstanding "future-work" areas of our documentation. This includes a few new sections:
I've also added these to the overview document.
"Bottom-up distant agent policies" was removed because it was a duplicate, while a few others appear "removed" because of changes to their names/fixes to typos.
A few of the ones that I haven't done as I was a bit unsure what we had previously discussed, are:
@scottcanoe and @hlee9212 tagging you for any thoughts you want to add and as this also gives an overview of some of the things we can work on soon.