-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
World model (work in progress PR for early commenting) #3
base: main
Are you sure you want to change the base?
Conversation
I have decided to base WorldModel on Env after all and have step() sample from transition_distribution() by default. |
please ignore what's under rl/mdp/ for now and only look at world_model/ |
src/world_model/mdp.py
Outdated
def __init__(self): | ||
super().__init__() | ||
|
||
def reset(self, seed=None, state = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second argument should be called options
, it's probably not a great idea to change that. Not sure what exactly you're going for by changing this, but I don't think defining reset
for the abstract MDP is necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, fixed the order. It is necessary though since the point is that an MDPEnv can reset to a particular state, which a generic Env cannot.
src/world_model/mdp.py
Outdated
All observations are full states, so the environment can be reset to an arbitrary state by providing the respective observation. | ||
""" | ||
|
||
def __init__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant, plus might cause side-effects (what if an actual env takes arguments in the constructor?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm never sure about these things. So you say derived classes can just leave out __init__
? Assuming this is the message, I have now removed all empty __init__
s from all these classes.
|
||
history = None | ||
"""The history of the current episode.""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comment in MDP
from numpy.random import choice | ||
from gymnasium import Env, ResetNeeded | ||
|
||
# TODO: add typing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do, I'm a type-driven creature and it's pretty hard for me to read code without types
src/world_model/mdp.py
Outdated
from typing import Any | ||
from gymnasium import Env | ||
|
||
class MDP(Env): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a different structure would be better - MDP
is just an abc.ABC
, doesn't necessarily inherit from Env. It should only contain the state
method. Then it effectively functions as a trait/type class, i.e. when something inherits from MDP
, it will only check that it contains state
and that's it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the point here is that an MDP can reset to a particular state, which a normal Env cannot. I have now split this into two classes, MDP
with only state()
and MDPEnv
, which adds reset(..., state=...)
.
src/world_model/world_model.py
Outdated
class WorldModel(Env): | ||
"""An abstract base class for potentially probabilistic world models, | ||
extending gymnasion.Env by providing methods for enquiring transition probabilities. | ||
Most implementations will probably combine this with a gymnasium.Env. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thing to consider regarding the class hierarchy.
If it is the case that "most implementations" will combine this with a gym.Env, then WorldModel shouldn't inherit from Env.
If we want it to be required that every WorldModel "combines" this with gym.Env, then the current inheritance might make sense
I'll pull out the discussion regarding the As I understand, the idea is that you can set the environment to a certain state. It might be sorta doable via modifying the I see two ways we could do this:
|
@martinkunev This can be merged now I believe. Please test the gridworlds I made, using
|
No description provided.