-
Notifications
You must be signed in to change notification settings - Fork 7
Discussion
(Matt Paisner)
The first thing is that there are two relevant answers. The first is to the question: "what will the architecture do by itself under each of these conditions?" The answer to that is always: "cycle through the phases without doing anything." The second question is "What will the basic phase implementations intended as baseline behavior for predicate worlds do under each condition?" That is what I will answer below.
Note: the demo script I showed you in the last meeting uses these baseline behaviors. Note 2: MIDCA can now be run either in interactive mode (which is identical to what we have used in the past) or programmatically, which means that the user can signal MIDCA to iterate through phases and cycles using code. The answers below refer to interactive mode, where this applies.
- Condition -1. No goals, plan, fires, or arsonist. Perceive will do what? How will the program stop (or will it?)
Perceive will copy the world state as normal. The program will not stop until told by the user to quit, but will also not really do anything other than passively cycle through phases
- Condition 0. No goals, fires, or arsonist. A plan exists but is empty. Is the behavior the same as condition -1?
Yes.
- Condition 1. No goals or plan. An arsonist starts a fire. How does MIDCA work with and without TF-Trees?
TF-Trees are not included in the baseline implementation. If we added a module which generates goals using a TF-Tree which has been trained to fight fires, MIDCA would generate a goal each time a fire occurred, generate plans to put them out, and act on those plans. Without TF-Trees no goals would be generated and MIDCA would take no action.
- Condition 2. No goals, fires or arsonist. A plan exists with one action. Put A on B. Is there any difference when plan has multiple actions? How does Act determine which is the current action?
MIDCA would not use the plan because it only acts on plans which achieve goals it has selected in Intend. We could, if we wanted, add a module to the Act phase which would pick a plan at random or by some criteria and execute it regardless of goals. At that point MIDCA's behavior would be determined by how the new module was designed.
- Condition 3. No plan, fires, or arsonist. A goal exists for A to be on B. A is on B in the initial state. How does Eval work?
In the first cycle, on(A,B) would not have been selected yet in Intend, so Eval would do nothing. In the second cycle, on(A,B) would be the current goal. Eval would determine that it had been achieved and remove it from the goal graph. Note that this only occurs when all currently selected goals have been achieved, which is true here because there is only one.
- Condition 4. No plan, fires, or arsonist. A goal exists to achieve a tower. How is SHOP called? How would another planner (e.g., Godel) be called? Is Eval any different than in condition 3?
I assume the goal would be on(X, Y), where we know that putting block X on block Y will complete a tower. In that case, there is a simple function in the planning module that converts this goal into a task that pyhop (python SHOP) can read, then planning occurs. Another planner would need a similar translation from MIDCA's goal formulation. Eval would note that the goal had not been achieved until the plan was complete, then note that it had been achieved and remove the goal and associated plan.
- Condition 5. No plan or arsonist. A fire exists in initial state. Two goals exist. One to build a tower and another to extinguish the fire. How does Intend work?
In the baseline implementation there is no goal ordering, so both goals would be selected and sent to shop for planning. If we wanted to order the goals, we would write a function which took two goals as inputs and returned an ordering on them (which could be unordered). The goal graph takes such a function as an optional input, and returns only those goals which are not currently preceded in the ordering. So if we wanted MIDCA to address fires first, we would write a function which always ordered fire goals before block stacking goals.
- Condition 6. The arsonist demo. How does it differ from the other conditions? How exactly does Interpret work? How does Eval do the scoring?
The baseline implementation does not really address this, since it is intended to be general and does not include arson, TF-Trees, or Meta-AQUA. There is a module which can be added to the simulation phase that will simulate arson (as we have seen it before). I am still in the process of porting TF-Trees to the new MIDCA version, and Meta-AQUA has not been ported but its behavior can easily be simulated. For the rest of this answer, I will assume that modules are included that instantiate or simulate these behaviors, and that a goal ordering function has been added that prioritizes arsonist catching over firefighting over block stacking.
Then, MIDCA will generate goals to fight all fires that occur, and to catch the arsonist when the Meta-AQUA simulator is activated, and will act on these goals as appropriate. If there are multiple goals with the same priority, all of them will be sent to SHOP at once, and a multi-objective plan will be enacted. Block stacking will be a bit less predictable. Because the behavior of TF-Trees is not well-defined in between the end states, if we call the stacking TF-Trees every cycle we will end up with strange behavior. One workaround to get the demo we have used before would be to have the module implementing TF-Trees check MIDCA's memory to see if there are any current block stacking goals in memory, and only run the TF-Tree goal generation if there are not.
As far as scoring, Eval will not do it in the baseline implementation but we could add a module that did scoring fairly easily.
We also might want to consider whether we want to continue with the old demo or think of a new example. One alternative in the same domain would be to generate a new pseudo-random configuration of blocks (and perhaps some chance of having or not having an arsonist) every time a tower is completed or every n time steps, and have MIDCA face the challenge of deciding how high a tower it should try to build and whether to waste time searching for an arsonist based on the conditions it has observed. This seems to offer deeper opportunities for metacognition - MIDCA would have to know about its own planning abilities under various conditions; and/or we might design to act according to certain patterned behaviors and MIDCA would have to figure them out and decide accordingly; etc. In contrast to the old example, this one has the advantages that it is easier to motivate (a builder moving between sites and getting as much work done as possible under tough and differing conditions), and that the correct solution is not always immediately obvious, necessitating a more thoughtful agent. This example also seems like it might translate more seamlessly to Baxter - the robot would have to decide how high to build a tower based on the initial configuration of blocks and its own knowledge about its fine motor control and the increased difficulty of tower construction as height increases. We might also add time constraints, or a mischievous human how would knock down the down under certain conditions or with some probability. Because each situation would be different rather than a repetition of the same three states, it would be possible to do non-trivial learning as well.
Go to MIDCA Home page