diff --git a/docs/figures/future-work/dinner_medieval.png b/docs/figures/future-work/dinner_medieval.png new file mode 100644 index 00000000..af49854b Binary files /dev/null and b/docs/figures/future-work/dinner_medieval.png differ diff --git a/docs/figures/future-work/dinner_standard.png b/docs/figures/future-work/dinner_standard.png new file mode 100644 index 00000000..f060f991 Binary files /dev/null and b/docs/figures/future-work/dinner_standard.png differ diff --git a/docs/figures/future-work/dinner_variations_medieval.png b/docs/figures/future-work/dinner_variations_medieval.png new file mode 100644 index 00000000..4e071c03 Binary files /dev/null and b/docs/figures/future-work/dinner_variations_medieval.png differ diff --git a/docs/figures/future-work/dinner_variations_standard.png b/docs/figures/future-work/dinner_variations_standard.png new file mode 100644 index 00000000..f5d59fe3 Binary files /dev/null and b/docs/figures/future-work/dinner_variations_standard.png differ diff --git a/docs/future-work/cmp-hierarchy-improvements.md b/docs/future-work/cmp-hierarchy-improvements.md index 7006acbc..ce4e4d65 100644 --- a/docs/future-work/cmp-hierarchy-improvements.md +++ b/docs/future-work/cmp-hierarchy-improvements.md @@ -4,9 +4,8 @@ description: Improvements we would like to add to the CMP or hierarchical inform --- These are the things we would like to implement: -- [Figure out performance measure and supervision in heterarchy](cmp-hierarchy-improvements/figure-out-performance-measure-and-supervision-in-heterarchy.md) #infrastructure -- [Add top-down connections](cmp-hierarchy-improvements/add-top-down-connections.md) #numsteps -- [Add associative connections](cmp-hierarchy-improvements/add-associative-connections.md) #abstract #numsteps +- [Figure out performance measures and supervision in heterarchy](cmp-hierarchy-improvements/figure-out-performance-measure-and-supervision-in-heterarchy.md) #infrastructure #compositional +- [Add top-down connections](cmp-hierarchy-improvements/add-top-down-connections.md) #numsteps #multiobj #compositional - [Run & Analyze experiments with >2LMs in heterarchy testbed](cmp-hierarchy-improvements/run-analyze-experiments-with-2lms-in-heterarchy-testbed.md) #compositional - [Run & Analyze experiments in multiobject environment looking at scene graphs](cmp-hierarchy-improvements/run-analyze-experiments-in-multiobject-environment-looking-at-scene-graphs.md) #multiobj - [Test learning at different speeds depending on level in hierarchy](cmp-hierarchy-improvements/test-learning-at-different-speeds-depending-on-level-in-hierarchy.md) #learning #generalization diff --git a/docs/future-work/cmp-hierarchy-improvements/add-associative-connections.md b/docs/future-work/cmp-hierarchy-improvements/add-associative-connections.md deleted file mode 100644 index 743b9b03..00000000 --- a/docs/future-work/cmp-hierarchy-improvements/add-associative-connections.md +++ /dev/null @@ -1,4 +0,0 @@ ---- -title: Add Associative Connections ---- -Are those the same as voting connections? I think so. \ No newline at end of file diff --git a/docs/future-work/cmp-hierarchy-improvements/add-top-down-connections.md b/docs/future-work/cmp-hierarchy-improvements/add-top-down-connections.md index af2faca8..da699795 100644 --- a/docs/future-work/cmp-hierarchy-improvements/add-top-down-connections.md +++ b/docs/future-work/cmp-hierarchy-improvements/add-top-down-connections.md @@ -1,3 +1,9 @@ --- title: Add Top-Down Connections --- + +In Monty systems, low-level LMs project to high-level LMs, where this projection occurs if their sensory receptive fields are co-aligned. Hierarchical connections should be able to learn a mapping between objects represented at these low-level LMs, and objects represented in the high-level LMs that frequently co-occur. + +For example, a high-level LM of a dinner-set might have learned that the fork is present at a particular location in its internal reference frame. When at that location, it would therefore predict that the low-level LM should be sensing a fork, enabling the perception of a fork in the low-level LM even when there is a degree of noise or other source of uncertainty in the low-level LM's representation. + +In the brain, these top-down projections correspond to L6 to L1 connections, where the synapses at L1 would support predictions about object ID. However, these projections also form local synapses en-route through the L6 layer of the lower-level cortical column. In a Monty LM, this would correspond to the top-down connection predicting not just the object that the low-level LM should be sensing, but also the specific location that it should be sensing it at. This could be complemented with predicting a particular pose of the low-level object (see [Use Better Priors for Hypothesis Initialization](../learning-module-improvements/use-better-hypothesis-priors.md)). \ No newline at end of file diff --git a/docs/future-work/cmp-hierarchy-improvements/figure-out-performance-measure-and-supervision-in-heterarchy.md b/docs/future-work/cmp-hierarchy-improvements/figure-out-performance-measure-and-supervision-in-heterarchy.md index d89625c8..559c61b7 100644 --- a/docs/future-work/cmp-hierarchy-improvements/figure-out-performance-measure-and-supervision-in-heterarchy.md +++ b/docs/future-work/cmp-hierarchy-improvements/figure-out-performance-measure-and-supervision-in-heterarchy.md @@ -1,3 +1,10 @@ --- -title: Figure out Performance Measure and Supervision in Heterarchy +title: Figure out Performance Measures and Supervision in Heterarchy --- +As we introduce hierarchy and compositional objects, such as a dinner-table setting, we need to figure out both how to measure the performance of the system, and how to supervise the learning. For the latter, we might choose to train the system on component objects in isolation (a fork, a knife, etc.) before then showing Monty the full compositional object (the dinner-table setting). When evaluating performance, we might then see how well the system retrieves representations at different levels of the hierarchy. However, in the more core setting of unsupervised learning, representations of the sub-objects would likely also emerge at the high level (a coarse knife representation, etc.), while we may also find some representations of the dinner scene in low-level LMs. Deciding then how we measure performance will be more difficult. + +When we move to objects with less obvious composition (i.e. where the sub-objects must be disentangled in a fully unsupervised manner), representations will emerge at different levels of the system that may not correspond to any labels present in our datasets. For example, handles, or the head of a spoon, may emerge as object-representations in low-level LMs, even though the dataset only recognizes labels like "mug" and "spoon". + +This is less clear, but one approach to measure the "correctness" of representations in this setting might be how well a predicted representation aligns with the outside world. For example, while LMs are not designed to be used as generative models, we could visualize how well an inferred object graph maps onto the object actually present in the world. Quantifying such alignment might leverage measures such as differences in point-clouds. This would provide some evidence of how well the learned decomposition of objects corresponds to the actual objects present in the world. + +See also [Make Dataset to Test Compositional Objects](../environment-improvements/make-dataset-to-test-compositional-objects.md) and [Metrics to Evaluate Categories and Generalization](../environment-improvements/create-dataset-and-metrics-to-evaluate-categories-and-generalization.md). \ No newline at end of file diff --git a/docs/future-work/cmp-hierarchy-improvements/send-similarity-encoding-object-id-to-next-level-test.md b/docs/future-work/cmp-hierarchy-improvements/send-similarity-encoding-object-id-to-next-level-test.md index 6143af06..22b39a08 100644 --- a/docs/future-work/cmp-hierarchy-improvements/send-similarity-encoding-object-id-to-next-level-test.md +++ b/docs/future-work/cmp-hierarchy-improvements/send-similarity-encoding-object-id-to-next-level-test.md @@ -1,3 +1,17 @@ --- title: Send Similarity Encoding Object ID to Next Level & Test --- + +We have implemented the ability to encode object IDs using sparse-distributed representations (SDRs), and in particular can use this as a way of capturing similarity and disimlarity between objects. Using such encodings in learned [Hierarchical Connections](add-top-down-connections.md), we should observe a degree of natural generalization when recognizing compositional objects. + +For example, assume a Monty system learns a dinner table setting with normal cuttlery and plates (see examples below). Separately, the system learns about medieval instances of cuttlery and plates, but never sees them arranged in a dinner table setting. Based on the similarity of the medieval cutterly objects to their modern counterparts, the objects should have considerable overlap in their SDR encodings. + +If the system was to then see a medieval dinner table setting for the first time, it should be able to recognize the arrangement as a dinner-table setting with reasonable confidence, even if the constituent objects are somewhat different from those present when the compositional object was first learned. + +We should note that we are still determining whether overlapping bits between SDRs is the best way to encode object similarity. As such, we are also open to exploring this task with alternative approaches, such as directly making use of values in the evidence-similarity matrix (from which SDRs are currently derived). + +![Standard dinner table setting](../../figures/future-work/dinner_standard.png) +*Example of a standard dinner table setting with modern cutlery and plates that the system could learn from.* + +![Medieval dinner table setting](../../figures/future-work/dinner_medieval.png) +*Example of a medieval dinner table setting with medieval cutlery and plates that the system could be evaluated on, after having observed the individual objects in isolation.* diff --git a/docs/future-work/cmp-hierarchy-improvements/test-learning-at-different-speeds-depending-on-level-in-hierarchy.md b/docs/future-work/cmp-hierarchy-improvements/test-learning-at-different-speeds-depending-on-level-in-hierarchy.md index 17fa7605..6c55bf01 100644 --- a/docs/future-work/cmp-hierarchy-improvements/test-learning-at-different-speeds-depending-on-level-in-hierarchy.md +++ b/docs/future-work/cmp-hierarchy-improvements/test-learning-at-different-speeds-depending-on-level-in-hierarchy.md @@ -1,3 +1,13 @@ --- title: Test Learning at Different Speeds Depending on Level in Hierarchy --- + +Our general view is that episodic memory and working memory in the brain leverage similar representations to those in learning modules, i.e. structured reference frames of discrete objects. + +For example, the brain has a specialized region for episodic memory (the hippocampal complex), due to the large number of synapses required to rapidly form novel binding associations. However, we believe the core algorithms of the hippocampal complex follow the same principles of a cortical column (and therefore a learning module), with learning simply occurring on a faster time scale. + +As such, we would like to explore adding forms of episodic and working memory by introducing high-level learning modules that learn information on extremely fast time scales relative to lower-level LMs. These should be particularly valuable in settings such as recognizing multi-object arrangements in a scene, and providing memory when a Monty system is performing a multi-step task. Note that because of the overlap in the core algorithms, LMs can be used largely as-is for these memory systems, with the only change being the learning rate. + +It is worth noting that the `GridObjectModel` would be particularly well suited for introducing a learning-rate parameter, due to its constraints on the amount of information that can be stored. + +As a final note, varying the learning rate across learning modules will likely play an important role in dealing with representational drift, and the impact it can have on continual learning. For example, we expect that low-level LMs, which partly form the representations in higher-level LMs, will change their representations more slowly. \ No newline at end of file diff --git a/docs/future-work/environment-improvements.md b/docs/future-work/environment-improvements.md index f3c663cb..7a942385 100644 --- a/docs/future-work/environment-improvements.md +++ b/docs/future-work/environment-improvements.md @@ -5,7 +5,7 @@ description: New environments and benchmark experiments we would like to add. These are the things we would like to implement: -- [Make dataset to test compositional objects](environment-improvements/make-dataset-to-test-compositional-objects.md) #compositional +- [Make dataset to test compositional objects](environment-improvements/make-dataset-to-test-compositional-objects.md) #compositional #multiobj - [Set up Environment that allows for object manipulation](environment-improvements/set-up-environment-that-allows-for-object-manipulation.md) #goalpolicy - [Set up object manipulation benchmark tasks and evaluation measures](environment-improvements/set-up-object-manipulation-benchmark-tasks-and-evaluation-measures.md) #goalpolicy - [Create dataset and metrics to evaluate categories and generalization](environment-improvements/create-dataset-and-metrics-to-evaluate-categories-and-generalization.md) #generalization diff --git a/docs/future-work/environment-improvements/create-dataset-and-metrics-to-evaluate-categories-and-generalization.md b/docs/future-work/environment-improvements/create-dataset-and-metrics-to-evaluate-categories-and-generalization.md index 728391c4..bee1027c 100644 --- a/docs/future-work/environment-improvements/create-dataset-and-metrics-to-evaluate-categories-and-generalization.md +++ b/docs/future-work/environment-improvements/create-dataset-and-metrics-to-evaluate-categories-and-generalization.md @@ -1,3 +1,11 @@ --- title: Create Dataset and Metrics to Evaluate Categories and Generalization --- + +Datasets do not typically capture the flexibility of object labels based on whether an object belongs to a broad class (e.g. cans), vs. a specific instance of a class (e.g. a can of tomato soup). + +Labeling a dataset with "hierarchical" labels, such that an object might be both a "can", as well as a "can of tomato soup" would be one approach to capturing this flexibility. Once available, classification accuracy could be assessed both at the level of individual object instances, as well as at the level of categories. + +We might leverage crowd-sourced labels to ensure that this labeling is reflective of human perception, and not biased by our beliefs as designers of Monty. This also relates to the general problem fo [Multi-Label Classification](https://paperswithcode.com/task/multi-label-classification), and so there may be off-the-shelf solutions that we can explore. + +Initially such labels should focus on morphology, as this is the current focus of Monty's recognition system. However, we would eventually want to also account for affordances, such as an object that is a chair, a vessel, etc. Being able to classify objects based on their affordances would be an experimental stepping stone to the true measure of the systems representations, which would be how well affordances are used to manipulate the world. \ No newline at end of file diff --git a/docs/future-work/environment-improvements/make-dataset-to-test-compositional-objects.md b/docs/future-work/environment-improvements/make-dataset-to-test-compositional-objects.md index abaf96e5..fbb4612f 100644 --- a/docs/future-work/environment-improvements/make-dataset-to-test-compositional-objects.md +++ b/docs/future-work/environment-improvements/make-dataset-to-test-compositional-objects.md @@ -1,3 +1,15 @@ --- title: Make Dataset to Test Compositional Objects --- + +We have developed an initial dataset based on recognizing a variety of dinner table sets with different arrangements of plates and cutlery. For example, the objects can be arranged in a normal setting, or aligned in a row (i.e. not a typical dinner-table setting). Similarly, the component objects can be those of a modern dining table, or those from a "medieval" time-period. As such, this dataset can be used to test the ability of Monty systems to recognize compositional objects based on the specific arrangement of objects, and to test generalization to novel compositions. + +By using explicit objects to compose multi-part objects, this dataset has the advantage that we can learn on the component objects in isolation, using supervised learning signals if necessary. It's worth noting that this is often how learning of complex compositional objects takes place in humans. For example, when learning to read, children begin by learning individual letters, which are themselves composed of a variety of strokes. Only when letters are learned can they learn to combine them into words. More generally, disentangling an object from other objects is difficult without the ability to interact with it, or see it in a sufficient range of contexts that it's separation from other objects becomes clear. + +However, we would eventually expect compositional objects to be learned in an unsupervised manner. When this is consistently possible, we can consider more diverse datasets where the component objects may not be as explicit. At that time, the challenges described in [Figure out Performance Measure and Supervision in Heterarchy](../cmp-hierarchy-improvements/figure-out-performance-measure-and-supervision-in-heterarchy.md) will become more relevant. + +![Dinner table set](../../figures/future-work/dinner_variations_standard.png) +*Example of compositional objects made up of modern cutlery and plates.* + +![Dinner table set](../../figures/future-work/dinner_variations_medieval.png) +*Example of compositional objects made up of medieval cutlery and plates.* diff --git a/docs/future-work/environment-improvements/set-up-environment-that-allows-for-object-manipulation.md b/docs/future-work/environment-improvements/set-up-environment-that-allows-for-object-manipulation.md index ce5979b1..a149978e 100644 --- a/docs/future-work/environment-improvements/set-up-environment-that-allows-for-object-manipulation.md +++ b/docs/future-work/environment-improvements/set-up-environment-that-allows-for-object-manipulation.md @@ -1,3 +1,7 @@ --- title: Set up Environment that Allows for Object Manipulation --- + +See [Decompose Goals Into Subgoals & Communicate](../motor-system-improvements/decompose-goals-into-subgoals-communicate.md) for a discussion of the kind of tasks we are considering for early object-manipulation experiments. An even simpler task that we have recently considered is pressing a switch to turn a lamp on or off. We will provide further details on what these tasks might look like soon. + +Beyond the specifics of any given task, an important part of this future-work component is to identify a good simulator for such settings. For example, we would like to have a setup where objects are subject to gravity, but are prevented from falling into a void by a table or floor. Other elements of physics such as friction should also be simulated, while it should be straightforward to reset an environment, and specify the arrangement of objects (for example using 3D modelling software). \ No newline at end of file diff --git a/docs/future-work/framework-improvements.md b/docs/future-work/framework-improvements.md index c1c6b7a3..dc779945 100644 --- a/docs/future-work/framework-improvements.md +++ b/docs/future-work/framework-improvements.md @@ -4,7 +4,7 @@ description: Improvements we would like to make on the general code framework. --- These are the things we would like to implement: -- [Add infrastructure for multiple agents that move independently](framework-improvements/add-infrastructure-for-multiple-agents-that-move-independently.md) #numsteps #infrastructure +- [Add infrastructure for multiple agents that move independently](framework-improvements/add-infrastructure-for-multiple-agents-that-move-independently.md) #numsteps #infrastructure #goalpolicy - [Automate benchmark experiments & analysis](framework-improvements/automate-benchmark-experiments-analysis.md) #infrastructure - [Add more wandb logging for learning from unsupervised](framework-improvements/add-more-wandb-logging-for-learning-unsupervised.md) #learning - [Add GPU support for Monty](framework-improvements/add-gpu-support-for-monty.md) #speed diff --git a/docs/future-work/framework-improvements/add-infrastructure-for-multiple-agents-that-move-independently.md b/docs/future-work/framework-improvements/add-infrastructure-for-multiple-agents-that-move-independently.md index e06fd204..c23db399 100644 --- a/docs/future-work/framework-improvements/add-infrastructure-for-multiple-agents-that-move-independently.md +++ b/docs/future-work/framework-improvements/add-infrastructure-for-multiple-agents-that-move-independently.md @@ -1,3 +1,11 @@ --- title: Add Infrastructure for Multiple Agents that Move Independently --- + +Currently, Monty's infrastructure only supports a single agent that moves around the scene, where that agent can be associated with a plurality of sensors and LMs. We would like to add support for multiple agents that move independently. + +For example, a hand-like surface-agent might explore the surface of an object, where each one of its "fingers" can move in a semi-independent manner. At the same time, a distant-agent might observe the object, saccading across its surface independent of the surface agent. At other times they might coordinate, such that they perceive the same location on an object at the same time, which would be useful while voting connections are still being learned (see [Generalize Voting to Associative Connections](../voting-improvements/generalize-voting-to-associative-connections.md)). + +An example of a first task that could make use of this infrastructure is [Implement a Simple Cross-Modal Policy for Sensory Guidance](../motor-system-improvements/simple-cross-modal-policy.md). + +It's also worth noting that we would like to move towards the concept of "motor modules" in the code-base, i.e. a plurarity of motor modules that convert from CMP-compliant goal states to non-CMP actuator changes. This would be a shift from the singular "motor system" that we currently have. \ No newline at end of file diff --git a/docs/future-work/learning-module-improvements.md b/docs/future-work/learning-module-improvements.md index 78177acf..d6fb8d4f 100644 --- a/docs/future-work/learning-module-improvements.md +++ b/docs/future-work/learning-module-improvements.md @@ -9,13 +9,16 @@ These are the things we would like to implement: - [Use off-object observations](learning-module-improvements/use-off-object-observations.md) #numsteps #multiobj - [Reinitialize hypotheses when starting to recognize a new object](learning-module-improvements/reinitialize-hypotheses-when-starting-to-recognize-a-new-object.md) #multiobj - [Improve bounded evidence performance](learning-module-improvements/improve-bounded-evidence-performance.md) #multiobj -- [Use models with less points](learning-module-improvements/use-models-with-less-points.md) #speed #generalization +- [Use models with fewer points](learning-module-improvements/use-models-with-fewer-points.md) #speed #generalization - [Make it possible to store multiple feature maps on one graph](learning-module-improvements/make-it-possible-to-store-multiple-feature-maps-on-one-graph.md) #featsandmorph - [Test particle-filter-like resampling of hypothesis space](learning-module-improvements/test-particle-filter-like-resampling-of-hypothesis-space.md) #accuracy #speed -- [Re-anchor hypotheses](learning-module-improvements/re-anchor-hypotheses.md) #deformations #noise #generalization +- [Re-anchor hypotheses for robustness to noise and distortions](learning-module-improvements/re-anchor-hypotheses.md) #deformations #noise #generalization - [Less dependency on first observation](learning-module-improvements/less-dependency-on-first-observation.md) #noise #multiobj - [Deal with incomplete models](learning-module-improvements/deal-with-incomplete-models.md) #learning - [Implement & test GNNs to model object behaviors & states](learning-module-improvements/implement-test-gnns-to-model-object-behaviors-states.md) #dynamic - [Deal with moving objects](learning-module-improvements/deal-with-moving-objects.md) #dynamic #realworld +- [Support scale invariance](learning-module-improvements/support-scale-invariance.md) #scale +- [Improve handling of symmetry](learning-module-improvements/improve-handling-of-symmetry.md) #pose +- [Use Better Priors for Hypothesis Initialization](learning-module-improvements/use-better-hypothesis-priors.md) #numsteps #pose #scale Please see the [instructions here](project-roadmap.md#how-you-can-contribute) if you would like to tackle one of these tasks. \ No newline at end of file diff --git a/docs/future-work/learning-module-improvements/deal-with-moving-objects.md b/docs/future-work/learning-module-improvements/deal-with-moving-objects.md index d76ac210..d122c5f5 100644 --- a/docs/future-work/learning-module-improvements/deal-with-moving-objects.md +++ b/docs/future-work/learning-module-improvements/deal-with-moving-objects.md @@ -1,3 +1,9 @@ --- title: Deal with Moving Objects --- + +This work relates to first being able to [Detect Local and Global Flow](../../future-work/sensor-module-improvements/detect-local-and-global-flow.md). + +Our current idea is to then use this information to model the state of the object, such that beyond its current pose, we also capture how it is moving as a function of time. This information can then be made available to other learning modules for voting and hierarchical processing. + +This work also relates to [Modeling Object Behaviors and States](../../future-work/learning-module-improvements/implement-test-gnns-to-model-object-behaviors-states.md), as an object state might be quite simple (the object is moving in a straight line at a constant velocity), or more complex (e.g. in a "spinning" or "dancing" state). To pass such information via the Cortical Messaging Protocol, the former would likely be treated similar to pose (i.e. specific information shared, but limited in scope), while the latter would be shared more similar to object ID, i.e. via a summary representation that can be learned via association. diff --git a/docs/future-work/learning-module-improvements/implement-test-gnns-to-model-object-behaviors-states.md b/docs/future-work/learning-module-improvements/implement-test-gnns-to-model-object-behaviors-states.md index 41686b88..c16c7dd5 100644 --- a/docs/future-work/learning-module-improvements/implement-test-gnns-to-model-object-behaviors-states.md +++ b/docs/future-work/learning-module-improvements/implement-test-gnns-to-model-object-behaviors-states.md @@ -1,3 +1,11 @@ --- title: Implement & Test GNNs to Model Object Behaviors & States --- + +We would like to test using local functions between nodes of an LM's graph to model object behaviors. In particular, we would like to model how an object evolves over time due to external and internal influences, by learning how nodes within the object impact one-another based on these factors. This relates to graph-neural networks, and [graph networks more generally](https://arxiv.org/pdf/1806.01261), however learning should rely on sensory and motor information local to the LM. Ideally learned relations will generalize across different edges, e.g. the understanding that two nodes are connected by a rigid edge vs. a spring. + +As noted, all learning should happen locally within the graph, so although gradient descent can be used, we should not back-propagate error signals through other LMs. Please see our related policy on [using Numpy rather than Pytorch for contributions](../../contributing/style-guide#numpy-preferred-over-pytorch). For further reading, see our discussion on [Modeling Object Behavior Using Graph Message Passing](https://github.com/thousandbrainsproject/monty_lab/tree/main/object_behaviors#implementation-routes-for-the-relational-inference-model) in the Monty Labs repository. + +We have a dataset that should be useful for testing approaches to this task, which can be found in [Monty Labs](https://github.com/thousandbrainsproject/monty_lab/tree/main/object_behaviors). + +At a broader level, we are also investigating alternative methods for modeling object behaviors, including sequence-based methods similar to HTM, however we believe it is worth exploring graph network approaches as one (potentially complementary) approach. In particular, we may find that such learned edges are useful for frequently encountered node-interactions like basic physics, while sequence-based methods are best suited for idiosyncratic behaviors. \ No newline at end of file diff --git a/docs/future-work/learning-module-improvements/improve-handling-of-symmetry.md b/docs/future-work/learning-module-improvements/improve-handling-of-symmetry.md new file mode 100644 index 00000000..366011f3 --- /dev/null +++ b/docs/future-work/learning-module-improvements/improve-handling-of-symmetry.md @@ -0,0 +1,11 @@ +--- +title: Improve Handling of Symmetry +--- + +LMs currently recognize symmetry by making multiple observations in a row that are all consistent with a set of multiple poses. I.e. if new observations of an object do not eliminate any of a set of poses, then it is likely that these poses are equivalent/symmetric. + +To make this more efficient and robust, we might store symmetric poses in long-term memory, updating them over time. In particular: +- Whenever symmetry is detected, the poses associated with the state could be stored for that object. +- Over-time, we can reduce or expand this list of symmetric poses, enabling the LM to establish with reasonable confidence that an object is in a symmetric pose as soon as the hypothesized poses fall within the list. + +By developing an established list of symmetric poses, we might also improve voting on such symmetric poses - see [Using Pose for Voting](../voting-improvements/use-pose-for-voting.md). \ No newline at end of file diff --git a/docs/future-work/learning-module-improvements/re-anchor-hypotheses.md b/docs/future-work/learning-module-improvements/re-anchor-hypotheses.md index 321ccd14..03dfc0c7 100644 --- a/docs/future-work/learning-module-improvements/re-anchor-hypotheses.md +++ b/docs/future-work/learning-module-improvements/re-anchor-hypotheses.md @@ -1,3 +1,9 @@ --- -title: Re-Anchor Hypotheses +title: Re-Anchor Hypotheses for Robustness to Noise and Distortions --- + +One aspect that we believe may contribute to dealing with object distortions, such as perceiving Dali's melted clocks for the first time, or being robust to the way a logo follows the surface of a mug, is through re-anchoring of hypotheses. More concretely, as the system moves over the object and path-integrates, the estimate of where the sensor is in space might lend greater weight to sensory landmarks, resulting in a re-assessment of the current location. Such re-anchoring is required even without distortions, due to the fact that path integration in the real world is imperfect. + +Such an approach would likely be further supported by hierarchical, top-down connections (see also [Add Top-Down Connections](../cmp-hierarchy-improvements/add-top-down-connections.md)). This will be relevant where the system has previously learned how a low-level object is associated with a high-level object at multiple locations, and where the low-level object is in some way distorted. In this instance, the system can re-instate where it is on the low-level object, based on where it is on the high-level object. Depending on the degree of distortion of the object, we would expect more such location-location associations to be learned in order to capture the relationship between the two. For example, a logo on a flat surface with a single 90-degree bend in it might just need two location associations to be learned and represented, while a heavily distorted logo would require more. + +It's worth emphasizing that this approach would also help reduce the reliance on the first observation. In particular, the first observation initializes the hypothesis space, so if that observation is noisy or doesn't resemble any of the points in the model, it has an overly-large impact on performance. \ No newline at end of file diff --git a/docs/future-work/learning-module-improvements/reinitialize-hypotheses-when-starting-to-recognize-a-new-object.md b/docs/future-work/learning-module-improvements/reinitialize-hypotheses-when-starting-to-recognize-a-new-object.md index 164a852c..09cbcd69 100644 --- a/docs/future-work/learning-module-improvements/reinitialize-hypotheses-when-starting-to-recognize-a-new-object.md +++ b/docs/future-work/learning-module-improvements/reinitialize-hypotheses-when-starting-to-recognize-a-new-object.md @@ -1,3 +1,7 @@ --- title: Reinitialize Hypotheses When Starting to Recognize a new Object --- + +We have recently implemented a method to detect when we have moved on to a new object based on significant changes in the accumulated evidence values for hypotheses. Integrating this method into the LMs is still in progress, but once complete, we would like to complement it with a process to reinitialize the evidence scores within the learning module. That way, when the LM detects it is on a new object, it can cleanly estimate what this new object might be. + +Eventually this could be complemented with top-down feedback from a higher-level LM modeling a scene or compositional object. In this case, the high-level LM biases the evidence values initialized in the low-level LM, based on what object should be present there according to the higher-level LM's model. Improvements here could also interact with the tasks of [Re-Anchor Hypotheses](../learning-module-improvements/re-anchor-hypotheses.md), and [Use Better Priors for Hypothesis Initialization](../learning-module-improvements/use-better-hypothesis-priors.md). diff --git a/docs/future-work/learning-module-improvements/support-scale-invariance.md b/docs/future-work/learning-module-improvements/support-scale-invariance.md new file mode 100644 index 00000000..bc7aaaea --- /dev/null +++ b/docs/future-work/learning-module-improvements/support-scale-invariance.md @@ -0,0 +1,18 @@ +--- +title: Support Scale Invariance +--- + +It remains unclear how scale invariance would be implemented at a neural level, although we have discussed the possibility that the frequency of oscillatory activity in neurons is scaled. This could in turn modulate how movements are accounted for during path integration. + +Regardless of the precise implementation, it is reasonable to assume that a given learning module will have a range of scales that it is able to represent, adjusting path integration in the reference frame according to the hypothesized scale. This scale invariance would likely have the following properties: +- Heuristics based on low-level sensory input (e.g. inferred distance) that are used to rapidly propose the most probable scales. +- Testing of different scales in parallel, similar to how we test different poses of an object. +- Storing the most commonly experienced scales in long-term memory, using these to preferentially bias initialized hypotheses, related to [Use Better Priors for Hypothesis Initialization](../learning-module-improvements/use-better-hypothesis-priors.md). + +These scales would represent a small sub-sampling of all possible scales, similar to how we test a subset of possible rotations, and consistent with the fact that human estimates of scale and rotation are imperfect and tend to align with common values. + +For example, if an enormous coffee mug was on display in an art installation, the inferred distance from perceived depth, together with the size of eye movements, could suggest that - whatever the object - features are separated on the scale of meters. This low-level information would inform testing objects on a large scale, enabling recognition of the object (albeit potentially with a small delay). If a mug was seen at a more typical scale, then it would likely be recognized faster, similar to how humans recognize objects in their more typical orientations more quickly. + +Thus, infrastructure for testing multiple scales (i.e. adjusted path integration in reference frames), or bottom-up heuristics to estimate scale, would be useful additions to the learning module. + +In addition to the above scale invariance within a single LM, we believe that different LMs in the hierarchy will have a preference for different scales, proportional to the receptive field sizes of their direct sensory input. This would serve a complimentary purpose to the above scale invariance, constraining the space of hypotheses that each LM needs to test. For example, low-level LMs might be particularly adapt at reading lettering/text. More generally, one can think of low-level LMs as being well suited to modeling small, detailed objects, while high-level LMs are better at modeling larger, objects at a coarser level of granularity. Once again, this will result in objects that are of typical scales being recognized more quickly. \ No newline at end of file diff --git a/docs/future-work/learning-module-improvements/test-particle-filter-like-resampling-of-hypothesis-space.md b/docs/future-work/learning-module-improvements/test-particle-filter-like-resampling-of-hypothesis-space.md index d347f306..96b36e48 100644 --- a/docs/future-work/learning-module-improvements/test-particle-filter-like-resampling-of-hypothesis-space.md +++ b/docs/future-work/learning-module-improvements/test-particle-filter-like-resampling-of-hypothesis-space.md @@ -1,3 +1,9 @@ --- title: Test Particle-Filter-Like Resampling of Hypothesis Space --- + +In order to make better use of the available computational resources, we might begin by sampling a "coarse" subset of possible hypotheses across objects at the start of an episode. As the episode progresses, we could re-sample regions that have high probability, in order to explore hypotheses there in finer detail. This would serve the purpose of enabling us to have broad hypotheses initially, without unacceptably large computational costs. At the same time, we could still develop a refined hypothesis of the location and pose of the object, given the additional sampling of high-probability regions. + +Furthermore, when the evidence values for a point in an LM's graph falls below a certain threshold, we generally stop testing it. Related to this, the initial feature pose detected when the object was first sensed determines the pose hypotheses that are initialized. We could therefore implement a method to randomly initialize a subset of rejected hypotheses, and then test these. This relates to [Less Dependency on First Observation](less-dependency-on-first-observation.md). + +This work could also tie in with the ability to [Use Better Priors for Hypothesis Initialization](../learning-module-improvements/use-better-hypothesis-priors.md), as these common poses could be resampled more frequently. diff --git a/docs/future-work/learning-module-improvements/use-better-hypothesis-priors.md b/docs/future-work/learning-module-improvements/use-better-hypothesis-priors.md new file mode 100644 index 00000000..9878ac17 --- /dev/null +++ b/docs/future-work/learning-module-improvements/use-better-hypothesis-priors.md @@ -0,0 +1,9 @@ +--- +title: Use Better Priors for Hypothesis Initialization +--- + +Currently all object poses are equally likely, because stimuli exist in a void and are typically rotated randomly at test time. However, as we move towards compositional and scene-like datasets where certain object poses are more common, we would like to account for this information in our hypothesis testing. + +A simple way to do this is to store in long-term memory the frequently encountered object poses, and bias these with more evidence during initialization. A consequence of this is that objects should be recognized more quickly when they are in a typical pose, consistent with human behavior (see e.g. [Lawson et al, 2003](https://www.sciencedirect.com/science/article/abs/pii/S0001691802000999?via%3Dihub)). + +In terms of implementation, this could be done either relative to a body-centric coordinate, through a hierarchical biasing, or both. With the former, the object would have an inherent bias towards a pose relative to the observer, or some more abstract reference-frame like gravity (e.g. right-side up coffee mug). With the latter, the pose would be biased with respect to a higher-level, compositional object. For example, in a dinner table setup, the orientation of the fork and knife would be biased relative to the plate, even though in of themselves, the fork and knife do not have any inherent bias in their pose. This information would be stored in the compositional dinner-set object in the higher level LM, and the bias in pose implemented by top-down feedback to the low-level LM. Such feedback could bias both specific poses, as well as specific locations of the child object relative to the parent object, or specific scales of an object (see also [Support Scale Invariance](../learning-module-improvements/support-scale-invariance.md)). \ No newline at end of file diff --git a/docs/future-work/learning-module-improvements/use-models-with-fewer-points.md b/docs/future-work/learning-module-improvements/use-models-with-fewer-points.md new file mode 100644 index 00000000..3327e1c1 --- /dev/null +++ b/docs/future-work/learning-module-improvements/use-models-with-fewer-points.md @@ -0,0 +1,5 @@ +--- +title: Use Models with Fewer Points +--- + +This task relates to the on-going implementation of hierarchically arranged LMs. As these become available, it should become possible to decompose objects into simpler sub-object components, which in turn will enable LMs to model objects with significantly fewer points than the ~2,000 per object currently used. \ No newline at end of file diff --git a/docs/future-work/learning-module-improvements/use-models-with-less-points.md b/docs/future-work/learning-module-improvements/use-models-with-less-points.md deleted file mode 100644 index 7f0084ef..00000000 --- a/docs/future-work/learning-module-improvements/use-models-with-less-points.md +++ /dev/null @@ -1,3 +0,0 @@ ---- -title: Use Models with Less Points ---- diff --git a/docs/future-work/learning-module-improvements/use-off-object-observations.md b/docs/future-work/learning-module-improvements/use-off-object-observations.md index 1c869f02..fecc8671 100644 --- a/docs/future-work/learning-module-improvements/use-off-object-observations.md +++ b/docs/future-work/learning-module-improvements/use-off-object-observations.md @@ -1,3 +1,11 @@ --- title: Use Off-Object Observations --- + +There are a variety of instances where a Monty system has a hypothesis about the current object, and then moves off the hypothesis-space of that object, either sensing nothing/empty space, or another object. For example, this can occur due to a model-free driven action like a saccade moving off the object, or the surface agent leaving the surface of the object. Similarly, a model-based action like the hypothesis-testing "jump" can move an agent to a location where the object doesn't exist if the hypothesis it tested was false. + +Currently we have methods to move the sensor back on to the object, however we do not make use of the information that the object was absent at the perceived location. However, this is valuable information, as the absence of the object at a location will be consistent with some object and pose hypotheses, but not others. + +For example, if the most-likely hypothesis is a coffee mug and the system performs a saccade that results in the nearest feature being very far away (such as the distant surface of a table), then any hypotheses about the pose of the mug that predicted there would be mug-parts should receive evidence against them. On the other hand, a hypothesis about the mug's pose that is *consistent* with the absence of the mug at that location should receive positive evidence. The same would apply if the surface agent leaves the surface of the mug, where the absense of mug is consistent with a subset of hypotheses. + +A more nuanced instance arises if there is something at the expected location, but it is a feature of a different object. For example, when moving to where the handle of the coffee mug is believed to be, we might sense a glass of water. Again, however, as the sensed features (transparent, smooth glass) are different from those predicted by hypotheses that there was a mug handle there, then these hypotheses should receive negative evidence. On the other hand, this observation of unusual features is entirely consistent with the hypotheses that predicted that the mug would not be there. As such, it should still be possible to adjust evidence values based on how observations match the predictions of each hypothesis. \ No newline at end of file diff --git a/docs/future-work/motor-system-improvements.md b/docs/future-work/motor-system-improvements.md index 0b8deb3e..bb9651a4 100644 --- a/docs/future-work/motor-system-improvements.md +++ b/docs/future-work/motor-system-improvements.md @@ -5,13 +5,14 @@ description: Improvements we would like to add to the motor system. These are the things we would like to implement: -- [Use a different policy for learning than for inference](motor-system-improvements/use-a-different-policy-for-learning-than-for-inference.md) #learning -- [Bottom-Up exploration policy for surface agent](motor-system-improvements/bottom-up-exploration-policy-for-surface-agent.md) #learning +- [Interpret goal states in motor system & switch policies](motor-system-improvements/interpret-goal-states-in-motor-system-switch-policies.md) #goalpolicy +- [Implement switching between learning and inference-focused policies](motor-system-improvements/implement-policy-switching-learning-vs-inference.md) #learning +- [Bottom-up exploration policy for surface agent](motor-system-improvements/bottom-up-exploration-policy-for-surface-agent.md) #learning - [Top-down exploration policy](motor-system-improvements/top-down-exploration-policy.md) #learning #numsteps -- [Bottom up policies for distant agent](motor-system-improvements/bottom-up-policies-for-distant-agent.md) #numsteps -- [Calculate saliency map in view finder and use it to saccade to region of interest ](motor-system-improvements/calculate-saliency-map-in-view-finder-and-use-it-to-saccade-to-region-of-interest.md)#numsteps #multiobj -- [Learn policy using RL](motor-system-improvements/learn-policy-using-rl.md) (using simplified action space) #numsteps +- [Implement efficient saccades driven by model-free and model-based signals](motor-system-improvements/implement-efficient-saccades-driven-by-model-free-and-model-based-signals.md) #numsteps #multiobj +- [Learn policy using RL and simplified action space](motor-system-improvements/learn-policy-using-rl.md) #numsteps #speed - [Decompose goals into subgoals & communicate](motor-system-improvements/decompose-goals-into-subgoals-communicate.md) #goalpolicy -- [Interprete goal states in motor system & switch policies](motor-system-improvements/interprete-goal-states-in-motor-system-switch-policies.md) #goalpolicy +- [Reuse hypothesis testing policy target points](motor-system-improvements/reuse-hypothesis-testing-points.md) #goalpolicy #numsteps +- [Implement a simple cross-modal policy](motor-system-improvements/simple-cross-modal-policy.md) #learning #multiobj #goalpolicy #numsteps Please see the [instructions here](project-roadmap.md#how-you-can-contribute) if you would like to tackle one of these tasks. \ No newline at end of file diff --git a/docs/future-work/motor-system-improvements/bottom-up-exploration-policy-for-surface-agent.md b/docs/future-work/motor-system-improvements/bottom-up-exploration-policy-for-surface-agent.md index b23a1856..001dbb62 100644 --- a/docs/future-work/motor-system-improvements/bottom-up-exploration-policy-for-surface-agent.md +++ b/docs/future-work/motor-system-improvements/bottom-up-exploration-policy-for-surface-agent.md @@ -1,3 +1,7 @@ --- title: Bottom-Up Exploration Policy for Surface Agent --- + +For the distant agent, we have a policy specifically tailored to learning, the naive scan policy, which systematically explores the visible surface of an object. We would like a similar policy for the surface agent that systematically spirals or scans across the surface of an object, at least in a local area. + +This would likely be complemented by [Top-Down Exploration Policies](top-down-exploration-policy.md). \ No newline at end of file diff --git a/docs/future-work/motor-system-improvements/bottom-up-policies-for-distant-agent.md b/docs/future-work/motor-system-improvements/bottom-up-policies-for-distant-agent.md deleted file mode 100644 index e23eff98..00000000 --- a/docs/future-work/motor-system-improvements/bottom-up-policies-for-distant-agent.md +++ /dev/null @@ -1,3 +0,0 @@ ---- -title: Bottom-Up Policies for Distant Agent ---- diff --git a/docs/future-work/motor-system-improvements/calculate-saliency-map-in-view-finder-and-use-it-to-saccade-to-region-of-interest.md b/docs/future-work/motor-system-improvements/calculate-saliency-map-in-view-finder-and-use-it-to-saccade-to-region-of-interest.md deleted file mode 100644 index 0686fcdb..00000000 --- a/docs/future-work/motor-system-improvements/calculate-saliency-map-in-view-finder-and-use-it-to-saccade-to-region-of-interest.md +++ /dev/null @@ -1,3 +0,0 @@ ---- -title: Calculate Saliency Map in View Finder and use it to Saccade to Region of Interest ---- diff --git a/docs/future-work/motor-system-improvements/decompose-goals-into-subgoals-communicate.md b/docs/future-work/motor-system-improvements/decompose-goals-into-subgoals-communicate.md index 84c3ded3..0a795810 100644 --- a/docs/future-work/motor-system-improvements/decompose-goals-into-subgoals-communicate.md +++ b/docs/future-work/motor-system-improvements/decompose-goals-into-subgoals-communicate.md @@ -1,3 +1,17 @@ --- title: Decompose Goals into Subgoals & Communicate --- + +This will be most relevant when we begin implementing policies that change the state of the world, rather than just those that support efficient sensing and inference. + +One example task we imagine is setting a dinner table. At the higher level of the system, a learning module that models dinner-tables would receive the goal-state to have the table in the "set for eating" state. This might be a vision-based LM that can use it's direct motor-output to saccade around the scene, and infer whether the table is set, but cannot actually move objects. + +It might perceive, for example, that the fork is not in the correct location for the learned model of a set dinner table. As such, it could pass a goal-state to the LM that models forks to be in the required pose in body-centric coordinates. The fork modeling-LM, which has information about the morphology of the fork, could then send goal-states directly to the motor-system, or to an LM that controls actuators like a hand. In either case, the ultimate aim is to apply pressure to the fork such that it achieves the desired goal state of being in the correct location. + +To set the entire dinner table, the higher-level LM would send out the sub-goal of the fork in the correct position, before moving on to other components of the table object, such as setting the position of the knife. + +In the above example, neither the dinner-table, fork, nor hand LMs have sufficient knowledge to complete the task on their own. Instead, it must be decomposed into a series of sub-goal states. + +How exactly we define the goal-states that carry out the practical process of applying pressure to move the fork is still a point of discussion, and so an early implementation might assume that a sub-cortical policy is already known that can move objects around the scene, based on a receive goal-state. Alternatively, we might begin with a simpler task such as pressing a button or key, where the motor policy simply needs to apply force at a specific location. + +Actually learning the causal relationships between states in low-level objects and high-level objects is also an aspect we are still developing ideas for. However, we know that these will be formed via hierarchical connections between LMs, similar to the [Top Down Connections Used for Sensory Prediction](../cmp-hierarchy-improvements/add-top-down-connections.md). \ No newline at end of file diff --git a/docs/future-work/motor-system-improvements/implement-efficient-saccades-driven-by-model-free-and-model-based-signals.md b/docs/future-work/motor-system-improvements/implement-efficient-saccades-driven-by-model-free-and-model-based-signals.md new file mode 100644 index 00000000..53fe4914 --- /dev/null +++ b/docs/future-work/motor-system-improvements/implement-efficient-saccades-driven-by-model-free-and-model-based-signals.md @@ -0,0 +1,11 @@ +--- +title: Implement Efficient Saccades Driven by Model-Free and Model-Based Signals +--- + +Currently the main way that the distant agent moves is by performing small, random, saccade-like movements. In addition, the entire agent can teleport to a received goal-state in order to e.g. test a hypothesis. We would like to implement the ability to perform larger saccades that are driven by both model-free and model-based signals, depending on the situation. + +In the model-free case, salient information available in the view-finder could drive the agent to saccade to a particular location. This could rely on a variety of computer-vision methods to extract a coarse saliency map of the scene. This is analogous to the sub-cortical processing performed by the superior colliculus (see e.g. [Basso and May, 2017](https://www.annualreviews.org/content/journals/10.1146/annurev-vision-102016-061234)). + +In the model-based case, two primary settings should be considered: +- A single LM has determined that the agent should move to a particular location in order to test a hypothesis, and it sends a goal-state that can be satisfied with a saccade, rather than the entire agent jumping/teleporting to a new location. For example, saccading to where the handle of a mug is believed to be will refute or confirm the current hypothesis. This is the more important/immediate use case. +- Multiple LMs are present, including a smaller subset of more peripheral LMs. If one of these peripheral LMs observes something of interest, it can direct a goal-state to the motor system to perform a saccade such that a dense sub-set of LMs are able to visualize the object. This is analogous to cortical feedback bringing the fovea to an area of interest. diff --git a/docs/future-work/motor-system-improvements/implement-policy-switching-learning-vs-inference.md b/docs/future-work/motor-system-improvements/implement-policy-switching-learning-vs-inference.md new file mode 100644 index 00000000..acc0c976 --- /dev/null +++ b/docs/future-work/motor-system-improvements/implement-policy-switching-learning-vs-inference.md @@ -0,0 +1,9 @@ +--- +title: Implement Switching Between Learning and Inference-Focused Policies +--- + +Currently, a Monty system cannot flexibly switch between a learning-focused policy (such as the naive scan policy) and an inference-focused policy. Enabling LMs to guide such a switch based on their internal models, and whether they are in a matching or exploration state, would be a useful improvement. + +This would be a specific example of a more general mechanism for switching between different policies, as discussed in [Switching Policies via Goal States](interpret-goal-states-in-motor-system-switch-policies.md). + +Similarly, an LM should be able to determine the most appropriate *model-based* policies to initialize, such as the hypothesis-testing policy vs. a [top-down exploration policy](top-down-exploration-policy.md). \ No newline at end of file diff --git a/docs/future-work/motor-system-improvements/interpret-goal-states-in-motor-system-switch-policies.md b/docs/future-work/motor-system-improvements/interpret-goal-states-in-motor-system-switch-policies.md new file mode 100644 index 00000000..f001fc88 --- /dev/null +++ b/docs/future-work/motor-system-improvements/interpret-goal-states-in-motor-system-switch-policies.md @@ -0,0 +1,9 @@ +--- +title: Interpret Goal States in Motor System & Switch Policies +--- + +We would like to implement a state-switching mechanism where an LM (or multiple LMs) can pass a goal-state to the motor system to switch the model-free policies that it is executing. + +For example, we might like to perform a thorough, random walk in a small region if the observations are noisy and we would like to sample them densely. Alternatively, we might like to move quickly across the surface of an object, spending little time in a given region. + +This task also relates to [Enable Switching Between Learning and Inference-Focused Policies](./implement-policy-switching-learning-vs-inference.md). \ No newline at end of file diff --git a/docs/future-work/motor-system-improvements/interprete-goal-states-in-motor-system-switch-policies.md b/docs/future-work/motor-system-improvements/interprete-goal-states-in-motor-system-switch-policies.md deleted file mode 100644 index 067a5824..00000000 --- a/docs/future-work/motor-system-improvements/interprete-goal-states-in-motor-system-switch-policies.md +++ /dev/null @@ -1,3 +0,0 @@ ---- -title: Interprete Goal States in Motor System & Switch Policies ---- diff --git a/docs/future-work/motor-system-improvements/learn-policy-using-rl.md b/docs/future-work/motor-system-improvements/learn-policy-using-rl.md index c313b5f4..15340f33 100644 --- a/docs/future-work/motor-system-improvements/learn-policy-using-rl.md +++ b/docs/future-work/motor-system-improvements/learn-policy-using-rl.md @@ -1,3 +1,11 @@ --- -title: Learn Policy using RL +title: Learn Policy using RL and Simplified Action Space --- + +Learning policies through rewards will become important when we begin implementing complex policies that change the state of the world. However, these could also be relevant for inference and learning, for example by learning when to switch policies instead of adhering to a single heuristic like in the curvature following policy. + +In general, we envision that we would use slow, deliberate model-based policies to perform a complex new task, such as one that involves coordinating multiple actuators. Initially, the action would always be performed in this slow, model-based manner. However, with each execution of the task, these sequences of movements provide samples for training a model-free policy to efficiently coordinate relevant actuators *in parallel*, and without the expensive sampling cost of model-based policies. + +For example, learning to oppose the finger and thumb in order to make a pinch grasp might initially involve moving one digit until it meets the surface of the object or the other digit, and then applying force with the other. Over time, a model-free policy could learn to move both digits together, with this "pinch policy" recruited by top-down goal-states as necessary. + +In addition to supporting efficient, parallel execution of actions, learned model-free policies will be important for more refined movements. For example, the movement required to press a very small button or balance an object might be coarsely guided by a model-based policy, but the fine motor control required to do so would be adjusted via a model-free policy. \ No newline at end of file diff --git a/docs/future-work/motor-system-improvements/reuse-hypothesis-testing-points.md b/docs/future-work/motor-system-improvements/reuse-hypothesis-testing-points.md new file mode 100644 index 00000000..f0d701d5 --- /dev/null +++ b/docs/future-work/motor-system-improvements/reuse-hypothesis-testing-points.md @@ -0,0 +1,9 @@ +--- +title: Reuse Hypothesis-Testing Policy Target Points +--- + +The hypothesis-testing policy is able to generate candidate points on an object that, when observed, should rapidly disambiguate between similar objects, or between similar poses of the same object. + +Generating these points requires a model-based policy that simulates the overlap in the graphs between the two most likely objects (or the two most likely poses of the same object). This is a relatively expensive operation, and so one approach would be to store these points in long-term memory, reusing them in future episodes. + +For example, when we have first learn about the concept of a mug, we might need to deliberately think about the fact that its handle is what distinguishes it from many other cylindrical objects. However, once we have experienced recognizing mugs a few times, we could quickly recall that testing the handle is a good way to confirm whether we are sensing a mug, or some other object. Related to this, an LM can track how sensing different regions of an object affects its evidence values for the collective hypotheses - those areas that have a disproportionate effect on a top hyptohesis are likely to be good candidates for testing in future episodes. \ No newline at end of file diff --git a/docs/future-work/motor-system-improvements/simple-cross-modal-policy.md b/docs/future-work/motor-system-improvements/simple-cross-modal-policy.md new file mode 100644 index 00000000..56545c73 --- /dev/null +++ b/docs/future-work/motor-system-improvements/simple-cross-modal-policy.md @@ -0,0 +1,12 @@ +--- +title: Implement a Simple Cross-Modal Policy for Sensory Guidance +--- + +Once we have infrastructure support for multiple agents that move independently (see [Add Infrastructure for Multiple Agents that Move Independently](../framework-improvements/add-infrastructure-for-multiple-agents-that-move-independently.md)), we would like to implement a simple cross-modal policy for sensory guidance. + +In particular, we can imagine a distant-agent rapidly saccading across a scene, observing objects of interest (see also [Implement Efficient Saccades](implement-efficient-saccades-driven-by-model-free-and-model-based-signals.md)). When an object is observed, the LM associated with the distant-agent could send a goal-state (either directly or via an actuator-modeling LM) that results in the surface agent moving to that object and then beginning to explore it in detail. + +Such a task would be relatively simple, while serving as a verification of a variety of components in the Cortical Messaging Protocol, such as: +- Recruiting agents that are not directly associated with the current LM, using goal-states (e.g. here we are recruiting the surface agent, rather than the distant agent). +- Coordination of multiple agents (the surface agent and distant agent might each inform areas of interest for the other to explore). +- Multi-modal voting (due to limited policies, voting has so far been limited to within-modality settings, although it supports cross-modal communication). \ No newline at end of file diff --git a/docs/future-work/motor-system-improvements/top-down-exploration-policy.md b/docs/future-work/motor-system-improvements/top-down-exploration-policy.md index 0b9a5cde..314f44ff 100644 --- a/docs/future-work/motor-system-improvements/top-down-exploration-policy.md +++ b/docs/future-work/motor-system-improvements/top-down-exploration-policy.md @@ -1,4 +1,9 @@ --- title: Top-Down Exploration Policy --- -Actively move to points on the object that are badly represented in the model. \ No newline at end of file +During exploration/learning-focused movement, we do not make use of any model-based, top-down policies driven by LMs. Two approaches we would like to implement are: +- A model-based policy that moves the sensors to areas that potentially represent the explored limits of an object. For example, if we've explored the surface of an object but not the entirity of it, then there will be points on the edge of the learned model with few neighboring points. Exploring in these regions is likely to efficiently uncover novel observations. Note that a "false-positive" for this heuristic is that thin objects like a wire or piece of paper will have such regions naturally at their edges, so it should only represent a bias in exploration, not a hard rule. +- A model-based policy that spends more time exploring regions associated with high-frequency feature changes, or discriminative object-parts. For example, the spoon and fork in YCB are easy to confuse if the models of their heads are not sufficiently detailed. Two heuristics to support greater exploration in this area could include: + - High-frequency changes in low-level features means we need a more detailed model of that part of the object. For example, the point-normals change frequently at the head of the fork, and so we likely need to explore it in more detail to develop a sufficiently descriptive model. The feature-change sensor-module is helpful for ensuring these observations are processed by the learning-module, but a modified policy would actually encourage more exploration in these regions. + - Locations that are useful for distinguishing objects, such as the fork vs. spoon heads, are worth knowing in detail, because they define the differences between similar objects. These points correspond to those that are frequently tested by the hypothesis-testing policy (see [Reuse Hypothesis-Testing Policy Target Points](../motor-system-improvements/reuse-hypothesis-testing-points.md)), and such stored locations can be leveraged to guide exploration. + - As we introduce hierarchy, it may be possible to unify these concepts under a single policy, i.e. where frequently changing features can either be at the sensory (low-level) input, or at the more abstract level of incoming sub-objects. \ No newline at end of file diff --git a/docs/future-work/motor-system-improvements/use-a-different-policy-for-learning-than-for-inference.md b/docs/future-work/motor-system-improvements/use-a-different-policy-for-learning-than-for-inference.md deleted file mode 100644 index 7000a045..00000000 --- a/docs/future-work/motor-system-improvements/use-a-different-policy-for-learning-than-for-inference.md +++ /dev/null @@ -1,3 +0,0 @@ ---- -title: Use a Different Policy for Learning than for Inference ---- diff --git a/docs/future-work/sensor-module-improvements/detect-local-and-global-flow.md b/docs/future-work/sensor-module-improvements/detect-local-and-global-flow.md index c54e146e..c9747bd2 100644 --- a/docs/future-work/sensor-module-improvements/detect-local-and-global-flow.md +++ b/docs/future-work/sensor-module-improvements/detect-local-and-global-flow.md @@ -1,3 +1,14 @@ --- title: Detect Local and Global Flow --- + +Our general view is that there are two sources of flow processed by cortical columns. These should correspond to: +- Local flow: detected in a small receptive field, and indicates that the *object is moving*. +- Global flow: detected in a larger receptive field, and indicates that the *sensor is moving*. +Note however that depending on the receptive field sizes, it may not be possible for a particular learning module to always distinguish these. For example, if an object is larger than the global-flow receptive field, then from that LM's perspective, it cannot distinguish between the object moving and the sensor moving. + +Note that flow can be either optical or based on sensed texture changes for a blind surface agent. + +Implementing methods so that we can estimate these two sources of flow and pass them to the LM will be an important step towards modeling objects with complex behaviors, as well as accounting for noise in the motor-system's estimates of self-motion. + +Eventually, similar techniques might be used to detect "flow" in how low-level LM representations are changing. This could correspond to movements in non-physical spaces, and enable more abstract representations in higher-level LMs. See also [Can We Change the CMP to Use Displacements Instead of Locations?](../voting-improvements/can-we-change-the-cmp-to-use-displacements-instead-of-locations.md) \ No newline at end of file diff --git a/docs/future-work/sensor-module-improvements/extract-better-features.md b/docs/future-work/sensor-module-improvements/extract-better-features.md index 4f86f70d..98fe379b 100644 --- a/docs/future-work/sensor-module-improvements/extract-better-features.md +++ b/docs/future-work/sensor-module-improvements/extract-better-features.md @@ -1,3 +1,11 @@ --- title: Extract Better Features --- + +Currently non-morphological features are very simple, such as extracting the RGB or hue value at the center of the sensor patch. + +In the short term, we would like to extract richer features, such as using HTM's spatial-pooler or Local Binary Patterns for visual features, or processing depth information within a patch to approximate tactile texture. + +In the longer-term, given the "sub-cortical" nature of this sensory processing, we might also consider neural-network based feature extraction, such as shallow convolutional neural networks, however please see [our FAQ on why Monty does not currently use deep learning](../../how-monty-works/faq-monty.md#why-does-monty-not-make-use-of-deep-learning). + +Note that regardless of the approach taken, features should be rotation invariant. For example, a textured pattern should be detected regardless of the sensor's orientation, and the representation of that texture should not be affected by the sensor's orientation. \ No newline at end of file diff --git a/docs/future-work/voting-improvements/can-we-change-the-cmp-to-use-displacements-instead-of-locations.md b/docs/future-work/voting-improvements/can-we-change-the-cmp-to-use-displacements-instead-of-locations.md index 8002309d..9520127c 100644 --- a/docs/future-work/voting-improvements/can-we-change-the-cmp-to-use-displacements-instead-of-locations.md +++ b/docs/future-work/voting-improvements/can-we-change-the-cmp-to-use-displacements-instead-of-locations.md @@ -1,3 +1,11 @@ --- title: Can we change the CMP to use displacements instead of locations? --- + +Movement is core to how LMs process and model the world. Currently, an LM receives an observation encoded with a body-centric location, and then infers a displacement in object-centric coordinates. Similarly, goal-states are specified as a target location in body-centric coordinates, which are then acted upon. + +However, a more general formulation might be to use displacements as the core spatial information in the CMP, such that a specific location (in body-centric coordinates or otherwise) is not the primary form of communication outside of an LM or sensor module. + +Such an approach might align well with adding information about flow (see [Detect Local and Global Flow](../sensor-module-improvements/detect-local-and-global-flow.md)), modeling moving objects (see [Deal With Moving Objects](../learning-module-improvements/deal-with-moving-objects.md)), and supporting abstract movements like the transition from grandchild to grandparent. It would also result in a reformulation of "goal-states" to "goal-displacements". + +Note that whatever approach is taken, we would still need to have some information about shared location representations at some level of the system in order to enable coordination and voting between LMs. This may relate to the division of "what" and "where" pathways in the brain, although this is not yet clear and requires further investigation. \ No newline at end of file diff --git a/docs/future-work/voting-improvements/generalize-voting-to-associative-connections.md b/docs/future-work/voting-improvements/generalize-voting-to-associative-connections.md index 096ee182..9d8216b3 100644 --- a/docs/future-work/voting-improvements/generalize-voting-to-associative-connections.md +++ b/docs/future-work/voting-improvements/generalize-voting-to-associative-connections.md @@ -1,4 +1,10 @@ --- title: Generalize Voting to Associative Connections --- -Needed when each LM learns different representations of the same object (object_id in the code) which is the case in the brain. \ No newline at end of file +Currently, voting relies on all learning modules sharing the same object ID for any given object, as a form of supervised learning signal. Thanks to this, they can vote on this particular ID when communicating with one-another. + +However, in the setting of unsupervised learning, the object ID that is associated with any given model is unique to the parent LM. As such, we need to organically learn the mapping between the object IDs that occur together across different LMs, such that voting can function without any supervised learning signal. This is the same issue faced by the brain, where a neural encoding in one cortical column (e.g. an SDR), needs to be associated with the different SDRs found in other cortical columns. + +Initially, such voting would be explored within modality (two different vision-based LMs learning the same object), or across modalities with similar object structures (e.g. the 3D objects of vision and touch). However, this same approach should unlock important properties, such as associating models that may be structurally very different, like the vision-based object of a cow, and the auditory object of "moo" sounds. Furthermore, this should eventually enable associating learned words with grounded objects, laying the foundations for language. + +Finally, this challenge relates to [Use Pose for Voting](./use-pose-for-voting.md), where we would like to vote on the poses of objects, since the learned poses are also going to be unique to each LM. \ No newline at end of file diff --git a/docs/future-work/voting-improvements/outline-routing-protocol-attention.md b/docs/future-work/voting-improvements/outline-routing-protocol-attention.md index 066b83c1..9f6a87b1 100644 --- a/docs/future-work/voting-improvements/outline-routing-protocol-attention.md +++ b/docs/future-work/voting-improvements/outline-routing-protocol-attention.md @@ -1,3 +1,9 @@ --- title: Outline Routing Protocol/Attention --- + +As we create Monty systems with more LMs, it will become increasingly important to be able to emphasize the representations in certain LMs over others, as a form of "covert" attention. This will complement the current ability to explicitly attend to a point in space through motor actions. + +For example in human children, learning new language concepts significantly benefits from shared attention with adults ("Look at the -"). A combination of attending to a point in space (overt attention), alongside narrowing the scope of active representations, is likely to be important for efficient associative learning. + +Implementation-wise, this will likely consist of a mixture of top-down feedback and lateral competition. \ No newline at end of file diff --git a/docs/future-work/voting-improvements/use-pose-for-voting.md b/docs/future-work/voting-improvements/use-pose-for-voting.md index 3faaa03e..1907390c 100644 --- a/docs/future-work/voting-improvements/use-pose-for-voting.md +++ b/docs/future-work/voting-improvements/use-pose-for-voting.md @@ -1,3 +1,9 @@ --- title: Use Pose for Voting --- + +Currently we do not send out pose hypotheses when we are voting, however we believe it will be an important signal to use. One complication is that the poses stored for any given LM's object models are arbitrary with respect to other LM's models, as each uses an object-centric coordinate system. + +This relates to [Generalize Voting To Associative Connections](./generalize-voting-to-associative-connections.md), which faces a similar challenge. + +To make this more efficient, it would also be useful to improve the way we represent symmetry in our object models (see [Improve Handling of Symmetry](../learning-module-improvements/improve-handling-of-symmetry.md)), as this will significantly reduce the number of associative connections that need to be learned for robust generalization. \ No newline at end of file diff --git a/docs/hierarchy.md b/docs/hierarchy.md index d8088156..16ebebb7 100644 --- a/docs/hierarchy.md +++ b/docs/hierarchy.md @@ -80,7 +80,7 @@ - [use-off-object-observations](future-work/learning-module-improvements/use-off-object-observations.md) - [reinitialize-hypotheses-when-starting-to-recognize-a-new-object](future-work/learning-module-improvements/reinitialize-hypotheses-when-starting-to-recognize-a-new-object.md) - [improve-bounded-evidence-performance](future-work/learning-module-improvements/improve-bounded-evidence-performance.md) - - [use-models-with-less-points](future-work/learning-module-improvements/use-models-with-less-points.md) + - [use-models-with-fewer-points](future-work/learning-module-improvements/use-models-with-fewer-points.md) - [make-it-possible-to-store-multiple-feature-maps-on-one-graph](future-work/learning-module-improvements/make-it-possible-to-store-multiple-feature-maps-on-one-graph.md) - [test-particle-filter-like-resampling-of-hypothesis-space](future-work/learning-module-improvements/test-particle-filter-like-resampling-of-hypothesis-space.md) - [re-anchor-hypotheses](future-work/learning-module-improvements/re-anchor-hypotheses.md) @@ -89,14 +89,13 @@ - [implement-test-gnns-to-model-object-behaviors-states](future-work/learning-module-improvements/implement-test-gnns-to-model-object-behaviors-states.md) - [deal-with-moving-objects](future-work/learning-module-improvements/deal-with-moving-objects.md) - [motor-system-improvements](future-work/motor-system-improvements.md) - - [use-a-different-policy-for-learning-than-for-inference](future-work/motor-system-improvements/use-a-different-policy-for-learning-than-for-inference.md) + - [implement-policy-switching-learning-vs-inference](future-work/motor-system-improvements/implement-policy-switching-learning-vs-inference.md) - [bottom-up-exploration-policy-for-surface-agent](future-work/motor-system-improvements/bottom-up-exploration-policy-for-surface-agent.md) - [top-down-exploration-policy](future-work/motor-system-improvements/top-down-exploration-policy.md) - - [bottom-up-policies-for-distant-agent](future-work/motor-system-improvements/bottom-up-policies-for-distant-agent.md) - - [calculate-saliency-map-in-view-finder-and-use-it-to-saccade-to-region-of-interest](future-work/motor-system-improvements/calculate-saliency-map-in-view-finder-and-use-it-to-saccade-to-region-of-interest.md) + - [implement-efficient-saccades-driven-by-model-free-and-model-based-signals](future-work/motor-system-improvements/implement-efficient-saccades-driven-by-model-free-and-model-based-signals.md) - [learn-policy-using-rl](future-work/motor-system-improvements/learn-policy-using-rl.md) - [decompose-goals-into-subgoals-communicate](future-work/motor-system-improvements/decompose-goals-into-subgoals-communicate.md) - - [interprete-goal-states-in-motor-system-switch-policies](future-work/motor-system-improvements/interprete-goal-states-in-motor-system-switch-policies.md) + - [interpret-goal-states-in-motor-system-switch-policies](future-work/motor-system-improvements/interpret-goal-states-in-motor-system-switch-policies.md) - [voting-improvements](future-work/voting-improvements.md) - [use-pose-for-voting](future-work/voting-improvements/use-pose-for-voting.md) - [outline-routing-protocol-attention](future-work/voting-improvements/outline-routing-protocol-attention.md) @@ -105,7 +104,6 @@ - [cmp-hierarchy-improvements](future-work/cmp-hierarchy-improvements.md) - [figure-out-performance-measure-and-supervision-in-heterarchy](future-work/cmp-hierarchy-improvements/figure-out-performance-measure-and-supervision-in-heterarchy.md) - [add-top-down-connections](future-work/cmp-hierarchy-improvements/add-top-down-connections.md) - - [add-associative-connections](future-work/cmp-hierarchy-improvements/add-associative-connections.md) - [run-analyze-experiments-with-2lms-in-heterarchy-testbed](future-work/cmp-hierarchy-improvements/run-analyze-experiments-with-2lms-in-heterarchy-testbed.md) - [run-analyze-experiments-in-multiobject-environment-looking-at-scene-graphs](future-work/cmp-hierarchy-improvements/run-analyze-experiments-in-multiobject-environment-looking-at-scene-graphs.md) - [test-learning-at-different-speeds-depending-on-level-in-hierarchy](future-work/cmp-hierarchy-improvements/test-learning-at-different-speeds-depending-on-level-in-hierarchy.md) diff --git a/docs/how-monty-works/open-questions.md b/docs/how-monty-works/open-questions.md index 51c9e026..39be388e 100644 --- a/docs/how-monty-works/open-questions.md +++ b/docs/how-monty-works/open-questions.md @@ -1,21 +1,23 @@ --- title: Open Questions -description: I'm just starting to collect a new list of open questions here. Still a WIP +description: Below is a simple outline of some of the open questions that we are currently exploring. --- +For more details, we also encourage checking out our [Future Work Roadmap](../future-work/project-roadmap.md) and related sections, where we go into some possible approaches to these questions. + # Learning Modules/Modeling ## Object Behaviors - How are object behaviors represented? -- How are the recognized? +- How are they recognized? ## Object Models - Where do we have a model of general physics? Can every LM learn the basic physics necessary for the objects it models (e.g. fluid-like behavior in some, cloth-like behavior in others)? Or are some LMs more specialized for this? ## Object Transformations -- How are the represented? +- How are they represented? - How are they recognized? ## Scale