addressing issues 5, 8, (part of 13) - new #17

erskordi · 2020-09-30T18:31:59Z

No description provided.

davebiagioni · 2020-09-30T18:53:54Z

alphazero/mod.py

@@ -0,0 +1,7 @@
+import alphazero.config as config


we can directly modify config variables in the run scripts, so mod.py is not needed

ok, mod.py was moved to stable_radical_optimization folder and renamed to stable_rad_config.py. Loading mod.py was also removed from all scripts in alphazero folder

pstjohn

So one other issue: Node.reward is cached, so if we indeed use game.reset() (instead of the current new game per iteration) approach, the cached reward will be a ranked reward, which might no longer be valid. In general, I'm wondering if we want to separate node.reward from node.ranked_reward; but that might also involve changing game.mcts_step; where we propagate up leaf.reward values.

pstjohn · 2020-09-30T18:44:24Z

stable_radical_optimization/run_mcts.py

-                    values (%s, %s, %s, %s, %s, %s);""", (
-                        self.smiles, float(reward), atom_type,
-                        float(spin_buried_vol), float(max_spin), atom_index))
+                        INSERT INTO {table} 


Does this work? It might be best practice to use a psycopg2.sql object:
https://www.psycopg.org/docs/sql.html

pstjohn · 2020-09-30T18:47:10Z

stable_radical_optimization/run_mcts.py

+
+        DROP TABLE IF EXISTS {table2};
+
+        CREATE TABLE {table2} (


so the game table should be a record of recently played games, with their associated (real) rewards. gameid should also be a column

pstjohn · 2020-09-30T18:48:36Z

stable_radical_optimization/run_mcts.py

+            with psycopg2.connect(**dbparams) as conn:
+                with conn.cursor() as cur:
+                    cur.execute("""
+                        INSERT INTO {table} 


So this implies that anytime a terminal state is reached (which can happen often throughout a single game) this is being added to the 'game' table. I think ranked reward is something we only want to compute from recently played, final game states. So I imagine this code would go after adding to the replay buffer table

OK, the

cur.execute(""" INSERT INTO {table}_game (experiment_id, gameid, real_reward) values (%s, %s);""".format(table=config.sql_basename), ( config.experiment_id, gameid, float(reward)))

is moved after adding to the replay buffer table. In order to distinguish between real and ranked rewards,
reward = game[-1].reward returns the real reward and later in the replay buffer we store the get_ranked_rewards(reward) ranked reward.

It doesn't look like you've pushed any changes yet, but my only concern with this approach is that, as written, game.py expects the node.reward value to be reward that we propagate up the MCTS tree; i.e., the ranked reward. So until we refactor that (#18), we should node.get_reward() return the ranked reward, and cache the intermediate true_reward somewhere like node._true_reward.

Yes, I haven't pushed anything yet, I'll push asap, the responses to comments are mostly for me to check what I'm addressing

Alright, I see what you mean here. I'm proceeding with it now

pstjohn · 2020-09-30T18:50:02Z

stable_radical_optimization/run_mcts.py

-                    ((gameid, node.smiles, game[-1].smiles, reward, i,
+                    """INSERT INTO {table} 
+                    (experiment_id, gameid, smiles, final_smiles, ranked_reward, position, data) values %s;""".format(table=replay_table),
+                    ((config.experiment_id, gameid, node.smiles, game[-1].smiles, reward, i,
                      node.get_action_inputs_as_binary()),))


here is where I'd add the game (and final, real reward) to the game table. That does involve making sure that nodes cache their true reward value, rather than just their ranked reward.

See comment above, let me know if this is a good strategy

See reply above. Just to be a bit more explicit; the node.value that gets estimated during MCTS should, for ranked rewards, lie between -1 and 1, and represent the probability that node is a win / loss by ranked rewards. And these are the values predicted by the policy network. But if we return the true reward in get_reward(), that calculation gets blasted with weird values (ie., 50) that are way outside that range

davebiagioni · 2020-09-30T18:54:08Z

alphazero/node.py

@@ -8,13 +8,11 @@
 import rdkit.Chem
 from tensorflow.keras.preprocessing.sequence import pad_sequences

-from alphazero.config import AlphaZeroConfig
+import alphazero.config as config
+import alphazero.mod as mod


unused, remove

davebiagioni · 2020-09-30T18:54:26Z

alphazero/policy.py

@@ -2,10 +2,9 @@
 from tensorflow.keras import layers
 import nfp

+import alphazero.config as config
+import alphazero.mod as mod


unused, remove

davebiagioni · 2020-09-30T18:54:39Z

stable_radical_optimization/run_mcts.py

@@ -10,12 +10,16 @@
 from rdkit import DataStructs
 import networkx as nx

+import alphazero.config as config
+import alphazero.mod as mod


unused, remove

davebiagioni · 2020-09-30T18:54:57Z

stable_radical_optimization/run_mcts.py

@@ -28,45 +32,59 @@
    'options': f'-c search_path=rl',
 }

+reward_table = config.sql_basename + "_reward"


here is where we can just modify config properties directly

davebiagioni · 2020-09-30T18:58:40Z

stable_radical_optimization/run_mcts.py

-#     with conn.cursor() as cur:
-#         cur.execute("""
-#         DROP TABLE IF EXISTS StableRewardPSJ;
+with psycopg2.connect(**dbparams) as conn:


from psj: Maybe we want to move these things over to a separate python script? Maybe initialize.py or similar.

then the submit_mcts.sh could do something like

python initialize.py # Run the one time srun python run_mcts.py # Run on all workers

Per our discussion, I created the stable_rad_config.py in directory stable_radical_optimization, where it overloads config.py and an initialize.py that creates PostgreSQL tables. I also updated submit_mcts.sh to execute:

python initialize.py srun python run_mcts.py

davebiagioni · 2020-09-30T18:59:13Z

stable_radical_optimization/run_mcts.py

+with psycopg2.connect(**dbparams) as conn:
+    with conn.cursor() as cur:
+        cur.execute("""
+        DROP TABLE IF EXISTS {table0};


from psj: Otherwise, we'd want this not to drop tables, but CREATE TABLE IF NOT EXISTS

Check comment above, use a separate script for initializing tables

can you push the new changes?

OK, I'll push the updated scripts I have so far and continue working on the rest of the comments/changes

davebiagioni · 2020-09-30T19:00:17Z

stable_radical_optimization/run_mcts.py

@@ -99,15 +117,44 @@ def get_reward(self):
            # This is a bit of a placeholder; but the range for spin is about 1/50th that
            # of buried volume.
            reward = (1 - max_spin) * 50 + spin_buried_vol
+
+            with psycopg2.connect(**dbparams) as conn:


psj: we should pull out the RR calculation into a separate function. For now, that could just live as a floating function in this file.

def get_ranked_rewards(reward, conn=None): if conn is None: conn = psycopg2.connect(**dbparams) with conn: ... do the calculation

Created function get_ranked_rewards, do you think is better to put it on a separate script as well? Or keep it in run_mcts.py?

I'd keep it in run_mcts. We'll likely want it to eventually be a class method of Node

OK, sounds good!

davebiagioni · 2020-09-30T19:01:51Z

stable_radical_optimization/run_mcts.py


 class StabilityNode(Node):

    def get_reward(self):

        with psycopg2.connect(**dbparams) as conn:
            with conn.cursor() as cur:
-                cur.execute("select reward from StableRewardPSJ where smiles = %s", (self.smiles,))
+                cur.execute("select reward from {table} where smiles = %s".format(table=reward_table), (self.smiles,))
                result = cur.fetchone()

        if result:


this returns the actual reward, right? don't we want to return the ranked reward for MCTS?

Just to make sure: we'll need to re-calculate the ranked reward on the fly, it doesn't make sense to store ranked rewards in the reward buffer

The values returned from the entire if-elif-else statement are all "filtered" through the get_ranked_rewards function to return ranked rewards instead of real rewards.

davebiagioni · 2020-09-30T19:14:22Z

stable_radical_optimization/run_mcts.py

+        CREATE TABLE {table0} (
+            id serial PRIMARY KEY,
+            time timestamp DEFAULT CURRENT_TIMESTAMP,
+            reward real,


we want ranked_reward here, too, right?

I'd disagree -- ranked rewards don't make much sense outside the context of the game they're evaluated in. If a later game discovers the same molecule, we'll want to make sure the ranked reward is current w.r.t the game buffer

no you're right, i hadn't thought this out yet but per our teams chat i agree

davebiagioni · 2020-09-30T19:14:41Z

stable_radical_optimization/run_mcts.py

+            id serial PRIMARY KEY,
+            time timestamp DEFAULT CURRENT_TIMESTAMP,
+            experiment_id varchar(50),
+            reward real);          


what about ~~ranked_reward and~~ final smiles?

pstjohn · 2020-09-30T19:24:29Z

stable_radical_optimization/run_mcts.py

@@ -28,45 +32,59 @@
    'options': f'-c search_path=rl',
 }

+reward_table = config.sql_basename + "_reward"
+replay_table = config.sql_basename + "_replay"
+game_table = config.sql_basename + "_game"


More generally, any reason to assign these to variables rather than just doing something like

"CREATE TABLE {basename}_replay".format(basename=config.sql_basename)

pstjohn · 2020-09-30T22:12:30Z

stable_radical_optimization/run_mcts.py

+                        values (%s, %s);""".format(table=game_table), (
+                            config.experiment_id, float(reward)))
+
+                df = pd.read_sql_query("""


can we change this to a count?

Yes, changed to

n_rewards = pd.read_sql_query(""" select count(*) from {table}_reward """.format(table=config.sql_basename), conn)

sounds great! another option might be to try to put this logic into a single SQL query https://www.postgresql.org/docs/9.1/functions-conditional.html. That's a minor change though, let's leave that for a later optimization

ok, let's keep it in mind for later : )

pstjohn · 2020-10-01T17:32:23Z

stable_radical_optimization/run_mcts.py

@@ -10,12 +10,15 @@
 from rdkit import DataStructs
 import networkx as nx

+import stable_rad_config as config


this probably won't work; i think you want it to read

import alphazero.config as config import stable_rad_config

Damn, you're right. And to think of it, I had it like that in the first place

pstjohn · 2020-10-01T17:34:41Z

stable_radical_optimization/run_mcts.py

-## This creates the table used to store the rewards
-## But, we don't want this to run every time we run the script, 
-## just keeping it here as a reference
+def get_ranked_rewards(reward, conn=None):


the idea with the conn kwarg is that you likely already have a SQL connection open, so we might as well use it. no idea if the syntax i wrote will work though :)

pstjohn

Just adding some additional code comments via github. I think this is ready to merge now though

pstjohn · 2020-10-02T02:03:27Z

alphazero/game.py


        # Run the policy network to get value and prior_logit predictions
-        values, prior_logits = model(parent.policy_inputs_with_children())
-        prior_logits = prior_logits[1:].numpy().flatten()
+        values, prior_logits = model.predict(parent.policy_inputs_with_children())


this seems to fix those retracing errors we were seeing

pstjohn · 2020-10-02T02:04:17Z

stable_radical_optimization/initialize.py

 import alphazero.config as config
 import stable_rad_config
 # Initialize PostgreSQL tables

+parser = argparse.ArgumentParser(description='Initialize the postgres tables.')
+parser.add_argument("--drop", action='store_true', help="whether to drop existing tables, if found")


sometimes we won't want to erase our previous calculations -- I'm not sure this is the right solution long-term though

pstjohn · 2020-10-02T02:04:46Z

stable_radical_optimization/run_mcts.py

 model = tf.keras.models.load_model(
-    '/projects/rlmolecule/pstjohn/models/20200923_radical_stability_model')
+    '/projects/rlmolecule/pstjohn/models/20200923_radical_stability_model',
+    compile=False)


this removes those 'manual compile' warnings that weren't relevant

pstjohn · 2020-10-02T02:05:25Z

stable_radical_optimization/run_mcts.py

-        """.format(table=config.sql_basename), conn)
+    with psycopg2.connect(**dbparams) as conn:
+        with conn.cursor() as cur:
+            cur.execute("select count(*) from {table}_game;".format(


so this was a big issue -- the ranked rewards needs to be calculated based off recently played games; not recent rewards.

I thought about that too, game holds the actual real rewards that we want to use for calculating r_alpha. Thanks!

pstjohn · 2020-10-02T02:06:00Z

stable_radical_optimization/run_mcts.py

+            with conn.cursor() as cur:
+                cur.execute("""
+                        select percentile_cont(%s) within group (order by real_reward) 
+                        from (select real_reward from {table}_game


same as above. Also note that you can use psycopg2 directly (rather than pandas.read_sql_query) for these single-value queries.

Great, good to know!

pstjohn · 2020-10-02T02:07:17Z

stable_radical_optimization/run_mcts.py

            self._true_reward = 0.
-            return rr
+            return config.min_reward


I think this is what we want -- otherwise, in the initial game playing (before we've built up a minimum buffer of games) we'll choose invalid molecules rather than those that are just not optimal

pstjohn · 2020-10-02T02:08:40Z

stable_radical_optimization/run_mcts.py

@@ -109,7 +119,8 @@ def get_reward(self):
                    cur.execute("""
                        INSERT INTO {table}_reward
                        (smiles, real_reward, atom_type, buried_vol, max_spin, atom_index) 
-                        values (%s, %s, %s, %s, %s, %s);""".format(table=config.sql_basename), (
+                        values (%s, %s, %s, %s, %s, %s)
+                        ON CONFLICT DO NOTHING;""".format(table=config.sql_basename), (


so, this is an important point for future reward caching -- if two nodes calculate the same reward simultaneously, we can end up calculating a reward for a molecule already in the reward buffer. So we have to catch that conflict and 'do nothing' (i.e., don't add the duplicate)

pstjohn · 2020-10-02T02:09:52Z

stable_radical_optimization/submit_mcts.sh

-#SBATCH -n 4
-#SBATCH -c 18
-#SBATCH --output=/scratch/eskordil/git-repos/rlmolecule_new/rlmolecule/mcts.%j.out
+#SBATCH -n 72


I did an htop after sshing into one of the worker nodes: i don't think these games benefit from much threading. So we might as well just blast the nodes with a bunch of copies of these; this lets us run 36 concurrent games on each node.

erskordi added 3 commits September 30, 2020 12:22

new branch-removed class

7671be6

use to change config on runtime

cd6e245

added mod

174db85

erskordi changed the title ~~addressing issues 5, 8, (part of 13)~~ addressing issues 5, 8, (part of 13) - new Sep 30, 2020

erskordi added 4 commits September 30, 2020 12:34

added mod

0eada33

added mod

7e0a4de

included ranked rewards

e257660

addressing ranked rewards

13a9802

erskordi requested a review from davebiagioni September 30, 2020 18:39

davebiagioni reviewed Sep 30, 2020

View reviewed changes

pstjohn reviewed Sep 30, 2020

View reviewed changes

davebiagioni reviewed Sep 30, 2020

View reviewed changes

pstjohn mentioned this pull request Sep 30, 2020

addressing issues 5, 8, (part of 13) #11

Closed

pstjohn reviewed Sep 30, 2020

View reviewed changes

erskordi added 6 commits October 1, 2020 10:50

changed from mod.py

6053c83

changed from mod.py

22b7f3c

few upgrades added

d8dc035

included initialize.py

0f37ecc

removed mod

1a1665a

removed mod

c7f7f47

removed mod

c3e61b9

pstjohn reviewed Oct 1, 2020

View reviewed changes

erskordi and others added 5 commits October 1, 2020 11:47

included _true_reward

b830454

added sys.path

75525d9

fixed get_reward

f30c142

fixed path of .out file

2ed5ede

fixing a few remaining issues

dbbea76

pstjohn force-pushed the eskordil/rr branch from 122d727 to dbbea76 Compare October 2, 2020 02:02

pstjohn reviewed Oct 2, 2020

View reviewed changes

pstjohn merged commit 6ecab09 into master Oct 2, 2020

addressing issues 5, 8, (part of 13) - new #17

addressing issues 5, 8, (part of 13) - new #17

Conversation

erskordi commented Sep 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pstjohn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davebiagioni Sep 30, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pstjohn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davebiagioni Sep 30, 2020 •

edited

Loading