Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

World model (work in progress PR for early commenting) #3

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

mensch72
Copy link
Contributor

No description provided.

@mensch72
Copy link
Contributor Author

I have decided to base WorldModel on Env after all and have step() sample from transition_distribution() by default.

@mensch72
Copy link
Contributor Author

please ignore what's under rl/mdp/ for now and only look at world_model/

def __init__(self):
super().__init__()

def reset(self, seed=None, state = None):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second argument should be called options, it's probably not a great idea to change that. Not sure what exactly you're going for by changing this, but I don't think defining reset for the abstract MDP is necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, fixed the order. It is necessary though since the point is that an MDPEnv can reset to a particular state, which a generic Env cannot.

All observations are full states, so the environment can be reset to an arbitrary state by providing the respective observation.
"""

def __init__(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant, plus might cause side-effects (what if an actual env takes arguments in the constructor?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm never sure about these things. So you say derived classes can just leave out __init__? Assuming this is the message, I have now removed all empty __init__s from all these classes.


history = None
"""The history of the current episode."""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment in MDP

from numpy.random import choice
from gymnasium import Env, ResetNeeded

# TODO: add typing

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do, I'm a type-driven creature and it's pretty hard for me to read code without types

from typing import Any
from gymnasium import Env

class MDP(Env):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a different structure would be better - MDP is just an abc.ABC, doesn't necessarily inherit from Env. It should only contain the state method. Then it effectively functions as a trait/type class, i.e. when something inherits from MDP, it will only check that it contains state and that's it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the point here is that an MDP can reset to a particular state, which a normal Env cannot. I have now split this into two classes, MDP with only state() and MDPEnv, which adds reset(..., state=...).

class WorldModel(Env):
"""An abstract base class for potentially probabilistic world models,
extending gymnasion.Env by providing methods for enquiring transition probabilities.
Most implementations will probably combine this with a gymnasium.Env.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thing to consider regarding the class hierarchy.

If it is the case that "most implementations" will combine this with a gym.Env, then WorldModel shouldn't inherit from Env.

If we want it to be required that every WorldModel "combines" this with gym.Env, then the current inheritance might make sense

@RedTachyon
Copy link

I'll pull out the discussion regarding the state stuff, since it was spread between a few places in the code.

As I understand, the idea is that you can set the environment to a certain state. It might be sorta doable via modifying the reset method, but I'm not sure the current approach is good.

I see two ways we could do this:

  1. Comply with the API, and do something like env.reset(options={"state": state})
  2. Add new methods get_state and set_state that operate independently of resetting

@mensch72
Copy link
Contributor Author

mensch72 commented Feb 7, 2024

@martinkunev This can be merged now I believe. Please test the gridworlds I made, using

python scripts/test_simple_gridworld.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants