Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev dagger #88

Open
wants to merge 66 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
d949a1c
Single python entry point & hydra based configuration
cloderic Jun 12, 2022
68953eb
Introducing DQN
cloderic Jun 17, 2022
4212224
Add support for petting zoo classic environment
cloderic Jun 20, 2022
757cf38
Self play and HILL DQN training for connect four
cloderic Jun 22, 2022
9219e00
Update readme with missing dependencies
cloderic Jun 23, 2022
4cb7598
Fix issue in the mountain car bc experiment conf
cloderic Jun 23, 2022
da28604
Fix bug for linux
joshair Jul 15, 2022
079c69f
Using cogment 2.5.0
cloderic Jul 15, 2022
2a6fc1a
add debugger for docs in the next branch (#82)
lhnguyen102 Jul 26, 2022
c9e32de
Pytorch multiproc fix (#81)
joshair Aug 18, 2022
36d0489
fix SimpleQueue issue (#83)
saikrishna-1996 Aug 23, 2022
282578c
fix log_metric bug (#86)
vabdollahi Aug 30, 2022
1556b8d
Upgrading Cogment and Gym (#87)
cloderic Aug 31, 2022
0fe5fe1
bootstrap dagger
vabdollahi Aug 31, 2022
fe88dfa
adding SimpleA2CModel as the teacher model
vabdollahi Aug 31, 2022
b22af88
adding learner model
vabdollahi Aug 31, 2022
5c9da8b
dagger algorithm
vabdollahi Aug 31, 2022
655402e
adding rewards to the metrics
vabdollahi Sep 1, 2022
c049822
some linting
vabdollahi Sep 1, 2022
eee7868
some cleanup
vabdollahi Sep 1, 2022
fbf9f6b
fix the observation bug
vabdollahi Sep 1, 2022
11950c9
renaming to student
vabdollahi Sep 1, 2022
3c90ce7
rename - more
vabdollahi Sep 1, 2022
9f25f18
using SimpleA2CActor as a teacher
vabdollahi Sep 1, 2022
4824163
separating the two phases of dagger
vabdollahi Sep 1, 2022
887e70d
removing
vabdollahi Sep 1, 2022
27ae637
lint
vabdollahi Sep 1, 2022
2eeacd7
Merge branch 'next' into dev_dagger
vabdollahi Sep 1, 2022
d0857b2
iterate for mlp training
vabdollahi Sep 6, 2022
88a8537
Using one episode as a sample
vabdollahi Sep 6, 2022
b979d83
removing done
vabdollahi Sep 6, 2022
3f43800
adding iterations
vabdollahi Sep 6, 2022
5562e4f
some config changes
vabdollahi Sep 7, 2022
ee97ec0
adding some mlflow info
vabdollahi Sep 7, 2022
338e3fb
updating readme
vabdollahi Sep 7, 2022
2e9302a
some code review changes
vabdollahi Sep 12, 2022
35cc9c1
storing student actions
vabdollahi Sep 12, 2022
edd6546
Adding linear schedule for choosing between teacher and student actions
vabdollahi Sep 12, 2022
2f813fc
Dev ppo (#91)
lhnguyen102 Oct 6, 2022
aed3e1b
fixe bugs in gyms (#98)
lhnguyen102 Oct 21, 2022
a8f8220
Hot fix (#100)
lhnguyen102 Oct 24, 2022
0e5a90a
bootstrap dagger
vabdollahi Aug 31, 2022
2e5fcc0
adding SimpleA2CModel as the teacher model
vabdollahi Aug 31, 2022
1c14ba5
adding learner model
vabdollahi Aug 31, 2022
fdfbcd5
dagger algorithm
vabdollahi Aug 31, 2022
8e44d6c
adding rewards to the metrics
vabdollahi Sep 1, 2022
a536f33
some linting
vabdollahi Sep 1, 2022
176c9db
some cleanup
vabdollahi Sep 1, 2022
ca08817
fix the observation bug
vabdollahi Sep 1, 2022
5fc1855
renaming to student
vabdollahi Sep 1, 2022
bb9d79e
rename - more
vabdollahi Sep 1, 2022
4368c9b
using SimpleA2CActor as a teacher
vabdollahi Sep 1, 2022
22b5725
separating the two phases of dagger
vabdollahi Sep 1, 2022
8d80c5c
removing
vabdollahi Sep 1, 2022
42de922
lint
vabdollahi Sep 1, 2022
0994797
iterate for mlp training
vabdollahi Sep 6, 2022
1b45591
Using one episode as a sample
vabdollahi Sep 6, 2022
32680c5
removing done
vabdollahi Sep 6, 2022
2bb3370
adding iterations
vabdollahi Sep 6, 2022
eca286b
some config changes
vabdollahi Sep 7, 2022
c3090d0
adding some mlflow info
vabdollahi Sep 7, 2022
34c0068
updating readme
vabdollahi Sep 7, 2022
d022a14
some code review changes
vabdollahi Sep 12, 2022
bc22394
storing student actions
vabdollahi Sep 12, 2022
83347a4
Adding linear schedule for choosing between teacher and student actions
vabdollahi Sep 12, 2022
770b48c
Merge branch 'dev_dagger' of https://github.com/cogment/cogment-verse…
vabdollahi Oct 24, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 4 additions & 10 deletions .apache-license-checker.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
ignore:
- "?eggs"
- "**/__pycache__"
- "**/.venv"
- "**/cogment_*.yaml"
- "**/_old"
- "**/*_pb2*.py"
- "**/*.pb.go"
- "**/*_pb*.js"
Expand All @@ -12,12 +10,8 @@ ignore:
- "**/CogSettings.d.ts"
- "**/CogSettings.js"
- "**/CogTypes.d.ts"
- "**/third_party"
- "*/cogment/api"
- "**/htmlcov"
- "web_client/node_modules"
- "web_client/build"
- "**/pybullet_driving_env/*"
- "**/node_modules"
- "**/build"
license:
CopyrightYear: 2021
CopyrightYear: 2022
Author: "AI Redefined Inc. <[email protected]>"
19 changes: 12 additions & 7 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Generated code
/*/*.proto
/*/cogment.yaml
cog_settings.py
*_pb2.py
*_pb2_grpc.py
base_python/cogment_verse/api
CogSettings.*
CogTypes.d.ts
*_pb.d.ts
*_pb.js
cogment_verse/web/cogment.yaml
cogment_verse/web/*.proto

# Python stuffs
__pycache__/
Expand All @@ -24,6 +24,11 @@ node_modules/

# Runtime data
/data
/debug

# Cogment
/.cogment
/.cogment_verse

# Run outputs generated by Hydra
outputs
multirun
50 changes: 28 additions & 22 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
@@ -1,44 +1,50 @@
stages:
- lint
- test

.base:
image: python:3.9
variables:
PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"
NPM_CACHE_DIR: "$CI_PROJECT_DIR/.cache/npm"
before_script:
- mkdir -p ${PIP_CACHE_DIR}
- mkdir -p ${NPM_CACHE_DIR}
# Installation instructions from https://github.com/nodesource/distributions/blob/master/README.md#installation-instructions
- curl -fsSL https://deb.nodesource.com/setup_14.x | bash -
- apt-get update && apt-get install -y software-properties-common && apt-add-repository non-free && apt-get update
- apt-get install -y nodejs swig unrar python3-opencv
- pip install virtualenv
- npm config set cache ${NPM_CACHE_DIR} --global
- apt-get update
- apt-get install -y swig python3-opencv
- python -m venv .venv
- source .venv/bin/activate
- pip install -r requirements.txt
cache:
# pip's cache
- paths:
- .cache/pip
- "**/.venv"
# npm's cache
# .venv
- key:
files:
- web_client/package-lock.json
- requirements.txt
paths:
- .cache/npm/
# atari roms cache
- key:
files:
- run.sh
- environment/requirements.txr
paths:
- environment/.atari_roms
- .venv

black:
stage: lint
extends: .base
script:
- black --check --diff .

build_and_test:
pylint:
stage: lint
extends: .base
script:
- ./run.sh build
- ./run.sh lint
- ./run.sh test
- pylint --recursive=y .

apache_licenses_check:
stage: lint
image: registry.gitlab.com/ai-r/apache-license-checker:latest
script:
- apache-license-checker

pytest:
stage: test
extends: .base
script:
- python -m pytest
109 changes: 61 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,75 +41,88 @@ Cogment verse includes environments from:

## Getting started

### Setup, Build and Run

1. Clone this repository
2. Install the following dependencies:
- [Python 3.9](https://www.python.org/) or above,
- [Node.JS v14](https://nodejs.org/) or above,
- `parallel`, on ubuntu it is installable using `apt-get install parallel`, on mac it is available through `brew install parallel`,
- `unrar`, on ubuntu it is installable using `apt-get install unrar`, on mac it is available through `brew install unrar`.
3. `./run.sh build`
4. `./run.sh services_start`
5. In a different terminal, start the trials with `./run.sh client start <run-name>`.
Different run names can be found in `run_params.yaml`
6. (Optional) To launch webclient, run `./run.sh web_client_start` in a different
terminal. Open http://localhost:8000 to join or visualize trials
2. Install [Python 3.9](https://www.python.org/)
3. Depending on your specific machine, you might also need to following dependencies:

- `swig`, which is required for the Box2d gym environments, it can be installed using `apt-get install swig` on ubuntu or `brew install swig` on macOS
- `python3-opencv`, which is required on ubuntu systems, it can be installed using `apt-get install python3-opencv`

4. Create and activate a virtual environment by runnning
```console
$ python -m venv .venv
$ source .venv/bin/activate
```
5. Install the python dependencies by running
```console
$ pip install -r requirements.txt
```
6. In another terminal, launch a mlflow server on port 3000 by running
```console
$ source .venv/bin/activate
$ python -m simple_mlflow
```
7. Start the default Cogment Verse run using `python -m main`
8. Open Chrome (other web browser might work but haven't tested) and navigate to http://localhost:8080/
9. Play the game!

#### Run monitoring
That's the basic setup for Cogment Verse, you are now ready to train AI agents.

You can monitor ongoing run using [mlflow](https://mlflow.org). By default a local instance of mlflow is started by cogment-verse and is accessible at <http://localhost:3000>.
### Configuration

#### Human player
Cogment Verse relies on [hydra](https://hydra.cc) for configuration. This enables easy configuration and composition of configuration directly from yaml files and the command line.

Some of the availabe run involve a human player,
for example `benchmark_lander_hill` enables a human player
to momentarily take control of the lunar lander to help the
AI agents during the training process.
The configuration files are located in the `config` directory, with defaults defined in `config/config.yaml`.

Then start the run
Here are a few examples:

```console
./run.sh client start benchmark_lander_hill
```
- Launch a Simple Behavior Cloning run with the [Mountain Car Gym environment](https://www.gymlibrary.ml/environments/classic_control/mountain_car/) (which is the default environment)
```console
$ python -m main +experiment=simple_bc/mountain_car
```
- Launch a Simple Behavior Cloning run with the [Lunar Lander Gym environment](https://www.gymlibrary.ml/environments/box2d/lunar_lander/)
```console
$ python -m main +experiment=simple_bc/mountain_car services/environment=lunar_lander
```
- Launch and play a single trial of the Lunar Lander Gym environment with continuous controls
```console
$ python -m main services/environment=lunar_lander_continuous
```
- Launch an A2C training run with the [Cartpole Gym environment](https://www.gymlibrary.ml/environments/classic_control/cartpole/)

Access the playing interface by launching a webclient with
`./run.sh web_client_start` and navigating to <http://localhost:8000>
```console
$ python -m main +experiment=simple_a2c/cartpole
```

#### **Play**
This one is completely _headless_ (training doens't involve interaction with a human player). It will take a little while to run, you can monitor the progress using mlflow at <http://localhost:3000>

The `play` run implementation can be used to have any actor play in any environment. 3 example run parameters are provided:
- Launch an DQN self training run with the [Connect Four Petting Zoo environment](https://www.pettingzoo.ml/classic/connect_four)

**`headless_play`** instanciates any agents and start a number of trials.
```console
$ python -m main +experiment=simple_dqn/connect_four
```

```console
./run.sh client start headless_play
```
The same experiment can be launched with a ratio of human-in-the-loop training trials (that are playable on in the web client)

**`observe`** instanciates any agents and start a number of trials with a human observer through the webclient.
```console
$ python -m main +experiment=simple_dqn/connect_four +run.hill_training_trials_ratio=0.05
```

```console
./run.sh client start observe
```
- Launch a [DAGGER](https://arxiv.org/abs/1011.0686) imitation learning algorithm by first training the expert using the simple_a2c method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a dedicated "page" for DAGGER (in docs/results) that have this exact content and maybe some further details and training examples.


**`play`** instanciates let a human player play in a supported environment.
```console
$ python -m main +experiment=simple_a2c/cartpole
```

```console
./run.sh client start play
```
Then modify the teacher_model_id field of the config/experiment/cartpole.yaml file with the model_id of the trained simple_a2c method. Next, run the DAGGER algorithm using

They can be inspected and adapted to your needs in `run_params.yaml`:
```console
$ python -m main +experiment=dagger/cartpole
```

## List of publications and submissions using Cogment and/or Cogment Verse

- Analyzing and Overcoming Degradation in Warm-Start Off-Policy Reinforcement Learning [code](https://github.com/benwex93/cogment-verse)
- Multi-Teacher Curriculum Design for Sparse Reward Environments [code](https://github.com/kharyal/cogment-verse/)

(please open a pull request to add missing entries)

## Acknowledgements

The subdirectories `/tf_agents/cogment_verse_tf_agents/third_party` and `/torch_agents/cogment_verse_torch_agents/third_party` contains code from third party sources

- `hive`: Taken from the [Hive library](https://github.com/chandar-lab/RLHive)
- `td3`: Taken form the [authors' implementation](https://github.com/sfujim/TD3)
1 change: 1 addition & 0 deletions .env → _old/.env
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ COGMENT_VERSE_MODEL_REGISTRY_PORT=9002
COGMENT_VERSE_TORCH_AGENTS_PORT=9003
COGMENT_VERSE_TF_AGENTS_PORT=9004
COGMENT_VERSE_ENVIRONMENT_PORT=9005
COGMENT_VERSE_PRETRIAL_HOOK_PORT=9006

## Prometheus metrics server ports
COGMENT_VERSE_TORCH_AGENTS_PROMETHEUS_PORT=9500
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

import pytest
from cogment_verse_environment.utils.serialization_helpers import deserialize_img, deserialize_np_array
from data_pb2 import AgentAction, EnvironmentConfig
from data_pb2 import PlayerAction, EnvironmentConfig
from mock_environment_session import ActorInfo

# pylint doesn't like test fixtures
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
import numpy as np
import pytest
from cogment_verse_environment.utils.serialization_helpers import deserialize_img, deserialize_np_array
from data_pb2 import AgentAction, EnvironmentConfig
from data_pb2 import PlayerAction, EnvironmentConfig
from mock_environment_session import ActorInfo

# pylint doesn't like test fixtures
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
import pytest
from cogment_verse_environment.procgen_env import ENV_NAMES, ProcGenEnv
from cogment_verse_environment.utils.serialization_helpers import deserialize_img, deserialize_np_array
from data_pb2 import AgentAction, EnvironmentConfig
from data_pb2 import PlayerAction, EnvironmentConfig
from mock_environment_session import ActorInfo

# pylint doesn't like test fixtures
Expand Down
File renamed without changes.
File renamed without changes.
4 changes: 2 additions & 2 deletions run.sh → _old/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ function mlflow_start() {
function web_client_build() {
_load_dot_env
export PORT="${COGMENT_VERSE_WEBCLIENT_PORT}"
export REACT_APP_ORCHESTRATOR_HTTP_ENDPOINT="${COGMENT_VERSE_ORCHESTRATOR_HTTP_ENDPOINT}"
export REACT_APP_ORCHESTRATOR_WEB_ENDPOINT="${COGMENT_VERSE_ORCHESTRATOR_HTTP_ENDPOINT}"
cp "${ROOT_DIR}/data.proto" "${ROOT_DIR}/cogment.yaml" "${ROOT_DIR}/web_client"
cd "${ROOT_DIR}/web_client"
npm install --no-audit
Expand All @@ -253,7 +253,7 @@ function web_client_start() {
function web_client_start_dev() {
_load_dot_env
export PORT="${COGMENT_VERSE_WEBCLIENT_PORT}"
export REACT_APP_ORCHESTRATOR_HTTP_ENDPOINT="${COGMENT_VERSE_ORCHESTRATOR_HTTP_ENDPOINT}"
export REACT_APP_ORCHESTRATOR_WEB_ENDPOINT="${COGMENT_VERSE_ORCHESTRATOR_HTTP_ENDPOINT}"
cd "${ROOT_DIR}/web_client"
npm run dev
}
Expand Down
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

import cv2
import numpy as np
from data_pb2 import AgentAction, ContinuousAction
from data_pb2 import PlayerAction, ContinuousAction

# TODO directly use tf tensors

Expand Down Expand Up @@ -53,11 +53,11 @@ def cog_action_from_tf_action(action):
if dtype in (int, np.int32, np.int64):
field = "discrete_action"
kwargs = {field: action}
return AgentAction(**kwargs)
return PlayerAction(**kwargs)

# else
# pylint: disable=no-member
agent_action = AgentAction(continuous_action=ContinuousAction())
agent_action = PlayerAction(continuous_action=ContinuousAction())
action = np.squeeze(action)
if action.shape == ():
agent_action.continuous_action.data.append(action)
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
from cogment_verse import AgentAdapter

from cogment_verse_torch_agents.utils.tensors import tensor_from_cog_obs
from data_pb2 import AgentAction
from data_pb2 import PlayerAction

from huggingface_sb3 import load_from_hub
from stable_baselines3 import PPO
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,11 @@
from cogment_verse_torch_agents.third_party.hive.dqn import DQNAgent
from cogment_verse_torch_agents.third_party.hive.rainbow import RainbowDQNAgent
from cogment_verse_torch_agents.third_party.td3.td3 import TD3Agent
from cogment_verse_torch_agents.wrapper import cog_action_from_torch_action, format_legal_moves, torch_obs_from_cog_obs
from cogment_verse_torch_agents.wrapper import (
cog_action_from_torch_action,
format_legal_moves,
torch_obs_from_cog_obs,
)
from data_pb2 import RunConfig
from prometheus_client import Summary

Expand Down Expand Up @@ -76,7 +80,15 @@ def _create(self, model_id, impl_name, environment_specs, **kwargs):

return model, model_user_data

def _load(self, model_id, version_number, model_user_data, version_user_data, model_data_f, **kwargs):
def _load(
self,
model_id,
version_number,
model_user_data,
version_user_data,
model_data_f,
**kwargs,
):
model = self.agent_class_from_impl_name(model_user_data["agent_implementation"])(
id=model_id,
obs_dim=int(model_user_data["num_input"]),
Expand Down Expand Up @@ -143,4 +155,10 @@ async def impl(actor_session):
return {impl_name: (create_actor_impl(impl_name), ["agent"]) for impl_name in self._agent_classes}

def _create_run_implementations(self):
return {"cogment_verse_run_impl": (sample_producer, create_training_run(self), RunConfig())}
return {
"cogment_verse_run_impl": (
sample_producer,
create_training_run(self),
RunConfig(),
)
}
Loading