Bug in HER replay pool (and multi goal setup not finished) #92

jendelel · 2019-06-07T12:05:38Z

Hi,

I noticed there is a bug in the HER replay pool. I would send a PR, but it seems that the whole multi goal setup is not nearly ready at the moment. There is no support in the policies (no policy has goal_keys attributes) and I couldn't really figure out that env.goal_key_map is supposed to represent. Therefore all the tests in HER replay pool are failing and the bug wasn't caught by the tests.

The bug:

def REPLACE_FULL_OBSERVATION(original_batch,
                             resampled_batch,
                             where_resampled,
                             environment):
    batch_flat = flatten(original_batch)
    resampled_batch_flat = flatten(original_batch)  # wrong
    goal_keys = [
        key for key in batch_flat.keys()
        if key[0] == 'goals'
    ]
    for key in goal_keys:
        assert (batch_flat[key][where_resampled].shape
                == resampled_batch_flat[key].shape)
        batch_flat[key][where_resampled] = (
            resampled_batch_flat[key])

    return unflatten(batch_flat)

should be

def REPLACE_FULL_OBSERVATION(original_batch,
                             resampled_batch,
                             where_resampled,
                             environment):
    batch_flat = flatten(original_batch)
    resampled_batch_flat = flatten(resampled_batch)  # correct
    goal_keys = [
        key for key in batch_flat.keys()
        if key[0] == 'goals'
    ]
    for key in goal_keys:
        assert (batch_flat[key][where_resampled].shape
                == resampled_batch_flat[key].shape)
        batch_flat[key][where_resampled] = (
            resampled_batch_flat[key])

    return unflatten(batch_flat)

Since I am trying to experiment with Hierarchical RL with multiple goals, I am more than happy to contribute with the mutli goal setup. From the existing code, I couldn't figure out what's the big picture.

All the best,

Lukas

The text was updated successfully, but these errors were encountered:

hartikainen · 2019-06-07T18:57:58Z

Hey @jendelel, thanks a lot for opening this! Definitely a bug on my end.

I'm also aware of the current limitations of the goal-conditioned setting in the repo right now. I only have together the very first iteration of the relabeling/goal-conditioning implementation, and I haven't really converged to a satisfying way of handling goal conditioned policies in this repo, which is why I decided to only push some of the basic utilities (i.e. the relabeling pool) for it.

I think one questions is, do we really even need goal conditioned policies or should everything be kept around in the observations? I personally would prefer to have goals and observations be separate from each other, but also don't like that all our algorithms would have to be duplicated for goal-conditioned policies. One way around that is to refactor all everything in the repo to be goal-conditioned, but allow goals to be empty, in which case things work just as if they were non-goal-conditioned. But I would need to play with these options a bit before I really know what feels the best way.

One of my own project depends on this and so I think I'll get a more polished version out in a week or two after ICML. I'd be really happy to accept contributions for this, as well as any ideas on how the goal-conditioned stuff could be made easier and flexible enough to use.

jendelel · 2019-06-10T15:32:17Z

Hi Kristian,

thanks for such a quick reply.

I tried to implement HAC and HIRO papers. From what I have implemented so far, I agree that it would be nice to have the goals and observations separately. Having goals everywhere and fill them only for goal supporting environments sounds like a good idea. For example, I believe that even policies should be aware of observations and goals. I can imagine a policy or a value function that processes goals (e.g. coordinates) and observations (e.g. images) differently.

In the hierarchical setup, it is often necessary to replace a goal with another goal even on the policy level. Currently, I had to hard wire that the goal is the first X dimensions of the observation, which isn't nice.

Let me know, how I can help.

Lukas

hartikainen mentioned this issue Jun 20, 2019

Misc improvements 20190619 #96

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in HER replay pool (and multi goal setup not finished) #92

Bug in HER replay pool (and multi goal setup not finished) #92

jendelel commented Jun 7, 2019

hartikainen commented Jun 7, 2019

jendelel commented Jun 10, 2019

Bug in HER replay pool (and multi goal setup not finished) #92

Bug in HER replay pool (and multi goal setup not finished) #92

Comments

jendelel commented Jun 7, 2019

hartikainen commented Jun 7, 2019

jendelel commented Jun 10, 2019