Add post query cycle script execution hook #32

agrawalreetika · 2024-11-14T16:37:59Z

This PR includes 2 changes -

Add post-query cycle script execution hook
Add pre stage script execution hook
Add cache cleanup scripts for Presto coordinator and workers (General Util functions to invalidate different types of worker and coordinator caches

ethanyzhang · 2024-11-14T17:29:28Z

stage/stage.go

@@ -44,6 +44,8 @@ type Stage struct {
 	PostStageShellScripts []string `json:"post_stage_scripts,omitempty"`
 	// Run shell scripts after executing each query.
 	PostQueryShellScripts []string `json:"post_query_scripts,omitempty"`
+	// Run shell scripts after finishing full query cycle runs each query.
+	PostQueryCycleShellScripts []string `json:"post_query_cycle_scripts,omitempty"`


@steveburnett for doc

A full cycle here means when we set cold_runs and warm_runs, each query in the benchmark will be run cold_runs + warm_runs times in total.

post_query_scripts will be called after each query execution, post_query_cycle_scripts will be called after all the cold_runs and warm_runs are done for a unique query.

ethanyzhang · 2024-11-14T17:38:11Z

benchmarks/test/my_post_query_cycle_script.py

+        sys.exit(-1)
+
+    file_path = sys.argv[1]
+    increment_file_value(file_path)


if we touched this file in this script, then I guess the unit test assert result should be updated?

ethanyzhang · 2024-11-14T17:40:25Z

benchmarks/scripts/cache_cleaning_coordinator_post_query.py

+    conn = create_connection(hostname, username, password, catalogName)
+    cur = conn.cursor()
+    cur.execute(query)
+    rows = cur.fetchall()


What is the error behavior?

I think it could be because of either the connection issue or query execution issue.
Let me wrap it under try-except

agrawalreetika · 2024-11-18T18:51:23Z

@ethanyzhang Thanks for your review. I have addressed your earlier comments.

Also, I have added one more commit in this PR for adding one more hook to enable pre-stage script execution hook.
Please let me know if that needs to be in a separate PR or If It's Okay to have it in this same PR for review.

yingsu00 · 2024-11-24T05:09:55Z

benchmarks/scripts/cache_cleaning_workers_post_query.py

+import paramiko
+import argparse
+
+def create_connection(host_name, user_name, user_password, db_name):


@ethanyzhang how does pbench manage the utility functions like create_connection? It also appears in benchmarks/scripts/cache_cleaning_coordinator_post_query.py

@agrawalreetika I think it may be better to move create_connection to a different file then include it in both the worker script and the coordinator script.

Say this file is called presto_utils.py, then here you can do:

from presto_utils import create_connection

yingsu00 · 2024-11-24T05:12:58Z

benchmarks/test/my_post_query_cycle_script.py

@@ -0,0 +1,11 @@
+import sys


This comment is for the two .py file names. Why did you add my_ in front? I think it's better to remove it

this folder is for testing and demoing.

What does "my" stand for?

yingsu00 · 2024-11-24T05:19:54Z

benchmarks/scripts/cache_cleaning_coordinator_post_query.py

@@ -0,0 +1,80 @@
+import prestodb


The cache cleaning should be done before (not after) each query cycle (cold+warm runs). And it should only be applied to TPC Power Tests.

So in the current implementation, I added 2 hooks -

post_query_scripts: run this script before this stage is started

pre_stage_scripts run this script after each query in this stage is complete
Which will eventually trigger the clean-up script, once when the overall query execution starts as part pre_stage_scripts and then using post_query_scripts after each query cycle (cold+warm runs) the same clean-up script would be executed. So by using both the hooks, I was running clean up script.

But now with this script name, I think it looks a little confusing. I will add one more hook named pre_query_scripts just call that before each query cycle (cold+warm runs) for each query and rename these clean up script from post* to pre*

Hi @agrawalreetika I think the clean cache scripts should be called in every pre_query_cycle_scripts, not post_query_scripts and pre_stage_scripts. So we want this order suppose cold runs=1 and warm runs=1
clean cache, q1, q1, clean cache, q2, q2, clean cache, q3, q3, ...

ethanyzhang

Hi @agrawalreetika, I developed some questions about error propagation. See my comments and I am happy to discuss further.

ethanyzhang · 2024-11-25T07:53:38Z

benchmarks/scripts/cache_cleaning_coordinator_post_query.py

+
+def clean_directory_list_cache(hostname, username, password, catalogName):
+    query = "CALL " + catalogName + ".system.invalidate_directory_list_cache()"
+    conn = create_connection(hostname, username, password, catalogName)


Looking at your implementation here it seems like this can just use execute_presto_select_query.
Maybe rename execute_presto_select_query to execute_presto_query.

ethanyzhang · 2024-11-25T07:53:52Z

benchmarks/scripts/cache_cleaning_coordinator_post_query.py

+
+def clean_metastore_cache(hostname, username, password, catalogName):
+    query = "CALL " + catalogName + ".system.invalidate_metastore_cache()"
+    conn = create_connection(hostname, username, password, catalogName)


ethanyzhang · 2024-11-25T07:58:20Z

stage/stage.go

+		// run post query cycle shell scripts
+		postQueryCycleErr := s.runShellScripts(ctx, s.PostQueryCycleShellScripts)
+		if retErr == nil {
+			retErr = postQueryCycleErr


So, if the query was executed successfully, but something went wrong when you run the script to clean up the cache, what should we do? I guess we should not ignore the cache cleanup error because it will give us false benchmark signals?

If so, the way for the PostQueryCycleShellScripts errors to propagate is to return a non-zero return value from the script using sys.exit(1). Then this will cause postQueryCycleErr here to be set as an execution error.

@ethanyzhang @agrawalreetika As we haven't tried it in real runs yet, this may or may not happen. But assume it happens because of some non-recoverable errors, the users won't be able to run any benchmark at all. Maybe we can consider issuing a warning but still continue the queries?

ethanyzhang · 2024-11-25T08:00:26Z

benchmarks/scripts/cache_cleaning_coordinator_post_query.py

+            if rows[0][0] == True:
+                print("Directory list cache clean up is successfull for", catalogName)
+            else:
+                print("Directory list cache clean up is failed for", catalogName)


See my comments below at stage.go for adding error propagation here.

ethanyzhang · 2024-11-25T08:00:40Z

benchmarks/scripts/cache_cleaning_coordinator_post_query.py

+            if rows[0][0] == True:
+                print("Metastore cache clean up is successfull for", catalogName)
+            else:
+                print("Metastore cache clean up is failed for", catalogName)


Same for error propagation

ethanyzhang · 2024-11-25T08:03:18Z

benchmarks/scripts/cache_cleaning_workers_post_query.py

+import paramiko
+import argparse
+
+def create_connection(host_name, user_name, user_password, db_name):


@agrawalreetika I think it may be better to move create_connection to a different file then include it in both the worker script and the coordinator script.

Say this file is called presto_utils.py, then here you can do:

from presto_utils import create_connection

ethanyzhang · 2024-11-25T08:05:49Z

benchmarks/scripts/cache_cleaning_workers_post_query.py

+        cleanup_worker_disk_cache(worker_public_ips, native_cache_directory_worker, "centos", args.sshkey)
+
+    if is_worker_os_cache_cleanup_enabled:
+        cleanup_worker_os_cache(worker_public_ips, "centos", args.sshkey)


Similar question about propagating errors that could happen here back to pbench. The benchmark should fail if we actually couldn't effectively clean up the cache.

ethanyzhang · 2024-11-25T08:06:12Z

benchmarks/test/my_post_query_cycle_script.py

@@ -0,0 +1,11 @@
+import sys


this folder is for testing and demoing.

agrawalreetika · 2024-11-28T11:39:40Z

@ethanyzhang @yingsu00 Thanks for your review. I have made the changes based on the review comments. Please check.

yingsu00 · 2024-12-07T00:21:41Z

benchmarks/scripts/cache_cleaning_workers.py

+def cleanup_worker_disk_cache(worker_public_ips, directory_to_cleanup, login_user, ssh_key_path):
+    for worker_ip in worker_public_ips:
+        try:
+            ssh = paramiko.SSHClient()


It might be better to extract the SSH connection to a utility function

yingsu00 · 2024-12-07T00:25:19Z

benchmarks/scripts/cache_cleaning_coordinator.py

+    # Directory list cache clean up
+    if is_list_cache_cleanup_enabled:
+        for catalogName in catalog_list:
+            print("Cleaning up directory list cache for", catalogName)


Should there be a space after "for"? Same for other occurrances

yingsu00 · 2024-12-07T00:26:14Z

benchmarks/scripts/cache_cleaning_coordinator.py

+        for catalogName in catalog_list:
+            print("Cleaning up directory list cache for", catalogName)
+            rows = clean_directory_list_cache(args.host, args.username, args.password, catalogName)
+            print("directory_list_cache_cleanup_query Query Result:", rows)


space after "Result:"?

yingsu00 · 2024-12-07T00:38:09Z

benchmarks/scripts/cache_cleaning_workers.py

+    is_worker_os_cache_cleanup_enabled = True
+
+    if is_worker_disk_cache_cleanup_enabled:
+        native_cache_directory_worker = "/home/centos/presto/async_data_cache"


Would this work on Ubuntu?

Currenlty pbench is getting called on Presto clusters, which runs on CentOs, where this is tested.

yingsu00 · 2024-12-07T00:47:43Z

benchmarks/scripts/cache_cleaning_workers.py

@@ -0,0 +1,114 @@
+from mysql_utils import create_connection


Where is this being called?

In https://github.com/prestodb/pbench/pull/32/files#diff-178b21505ceae58580985f1d1c6c27f05c66411c197c31a5efd7ca0acc94dd32R74 to get the worker list from mysql

yingsu00 · 2024-12-31T00:53:30Z

benchmarks/scripts/cache_cleaning_coordinator.py

+    is_metadata_cache_cleanup_enabled = False
+
+    # Directory list cache clean up
+    if is_list_cache_cleanup_enabled:


@agrawalreetika Is it possible to attach the output of these? What would the output look like if output rows are multiple lines?

yingsu00 · 2024-12-31T00:54:16Z

benchmarks/scripts/cache_cleaning_coordinator.py

+
+def clean_directory_list_cache(hostname, username, password, catalogName):
+    query = "CALL " + catalogName + ".system.invalidate_directory_list_cache()"
+    conn = create_connection(hostname, username, password, catalogName)


Shall we extract line 7-11 to a util function?

yingsu00 · 2024-12-31T00:54:30Z

benchmarks/scripts/cache_cleaning_coordinator.py

+
+def clean_metastore_cache(hostname, username, password, catalogName):
+    query = "CALL " + catalogName + ".system.invalidate_metastore_cache()"
+    conn = create_connection(hostname, username, password, catalogName)


This is the same as the previous

yingsu00 · 2024-12-31T00:58:14Z

benchmarks/scripts/cache_cleaning_workers.py

+        db_name=database
+    )
+
+    clusterName = args.clustername


Do we use Camel notation or snake_case notation? we need to unify them

yingsu00 · 2024-12-31T00:58:31Z

benchmarks/scripts/cache_cleaning_workers.py

+    args = parser.parse_args()
+
+    with open(args.mysql, 'r') as file:
+        mysqlDetails = json.load(file)


agrawalreetika requested a review from yingsu00 November 14, 2024 16:37

agrawalreetika requested review from ethanyzhang, wanglinsong and xpengahana as code owners November 14, 2024 16:38

agrawalreetika self-assigned this Nov 14, 2024

ethanyzhang requested changes Nov 14, 2024

View reviewed changes

agrawalreetika force-pushed the cache-invalidation branch 2 times, most recently from 4ea4c28 to d90f64e Compare November 18, 2024 18:42

yingsu00 reviewed Nov 24, 2024

View reviewed changes

ethanyzhang requested changes Nov 25, 2024

View reviewed changes

agrawalreetika force-pushed the cache-invalidation branch from 958fee3 to 6517d0b Compare November 28, 2024 11:37

yingsu00 reviewed Dec 7, 2024

View reviewed changes

agrawalreetika force-pushed the cache-invalidation branch 4 times, most recently from 339f67e to 4c9dc6f Compare December 30, 2024 07:39

yingsu00 reviewed Dec 31, 2024

View reviewed changes

agrawalreetika added 4 commits December 31, 2024 08:14

Add cache cleanup scripts for Presto coordinator and workers

a17826a

Add post query cycle script execution hook

059659f

Add pre stage script execution hook

9ac9b86

Add pre query cycle script execution hook

a857c16

agrawalreetika force-pushed the cache-invalidation branch from 4c9dc6f to a857c16 Compare December 31, 2024 03:00

Add post query cycle script execution hook #32

Are you sure you want to change the base?

Add post query cycle script execution hook #32

Conversation

agrawalreetika commented Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agrawalreetika commented Nov 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ethanyzhang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agrawalreetika commented Nov 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agrawalreetika commented Nov 14, 2024 •

edited

Loading