Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in HER replay pool (and multi goal setup not finished) #92

Open
jendelel opened this issue Jun 7, 2019 · 2 comments
Open

Bug in HER replay pool (and multi goal setup not finished) #92

jendelel opened this issue Jun 7, 2019 · 2 comments

Comments

@jendelel
Copy link

jendelel commented Jun 7, 2019

Hi,

I noticed there is a bug in the HER replay pool. I would send a PR, but it seems that the whole multi goal setup is not nearly ready at the moment. There is no support in the policies (no policy has goal_keys attributes) and I couldn't really figure out that env.goal_key_map is supposed to represent. Therefore all the tests in HER replay pool are failing and the bug wasn't caught by the tests.

The bug:

def REPLACE_FULL_OBSERVATION(original_batch,
                             resampled_batch,
                             where_resampled,
                             environment):
    batch_flat = flatten(original_batch)
    resampled_batch_flat = flatten(original_batch)  # wrong
    goal_keys = [
        key for key in batch_flat.keys()
        if key[0] == 'goals'
    ]
    for key in goal_keys:
        assert (batch_flat[key][where_resampled].shape
                == resampled_batch_flat[key].shape)
        batch_flat[key][where_resampled] = (
            resampled_batch_flat[key])

    return unflatten(batch_flat)

should be

def REPLACE_FULL_OBSERVATION(original_batch,
                             resampled_batch,
                             where_resampled,
                             environment):
    batch_flat = flatten(original_batch)
    resampled_batch_flat = flatten(resampled_batch)  # correct
    goal_keys = [
        key for key in batch_flat.keys()
        if key[0] == 'goals'
    ]
    for key in goal_keys:
        assert (batch_flat[key][where_resampled].shape
                == resampled_batch_flat[key].shape)
        batch_flat[key][where_resampled] = (
            resampled_batch_flat[key])

    return unflatten(batch_flat)

Since I am trying to experiment with Hierarchical RL with multiple goals, I am more than happy to contribute with the mutli goal setup. From the existing code, I couldn't figure out what's the big picture.

All the best,

Lukas

@hartikainen
Copy link
Member

Hey @jendelel, thanks a lot for opening this! Definitely a bug on my end.

I'm also aware of the current limitations of the goal-conditioned setting in the repo right now. I only have together the very first iteration of the relabeling/goal-conditioning implementation, and I haven't really converged to a satisfying way of handling goal conditioned policies in this repo, which is why I decided to only push some of the basic utilities (i.e. the relabeling pool) for it.

I think one questions is, do we really even need goal conditioned policies or should everything be kept around in the observations? I personally would prefer to have goals and observations be separate from each other, but also don't like that all our algorithms would have to be duplicated for goal-conditioned policies. One way around that is to refactor all everything in the repo to be goal-conditioned, but allow goals to be empty, in which case things work just as if they were non-goal-conditioned. But I would need to play with these options a bit before I really know what feels the best way.

One of my own project depends on this and so I think I'll get a more polished version out in a week or two after ICML. I'd be really happy to accept contributions for this, as well as any ideas on how the goal-conditioned stuff could be made easier and flexible enough to use.

@jendelel
Copy link
Author

Hi Kristian,

thanks for such a quick reply.

I tried to implement HAC and HIRO papers. From what I have implemented so far, I agree that it would be nice to have the goals and observations separately. Having goals everywhere and fill them only for goal supporting environments sounds like a good idea. For example, I believe that even policies should be aware of observations and goals. I can imagine a policy or a value function that processes goals (e.g. coordinates) and observations (e.g. images) differently.

In the hierarchical setup, it is often necessary to replace a goal with another goal even on the policy level. Currently, I had to hard wire that the goal is the first X dimensions of the observation, which isn't nice.

Let me know, how I can help.

Lukas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants