Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Ray Workflow: Multiple Run Support, Distributed Hyperparameter Tuning, and Consistent Setup Across Local/Cloud #1301

Open
wants to merge 132 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 82 commits
Commits
Show all changes
132 commits
Select commit Hold shift + click to select a range
67122dc
start
garylvov Sep 27, 2024
2d207b5
add feature extraction
glvov-bdai Sep 24, 2024
1aa2832
blank
garylvov Sep 27, 2024
e4c395f
further
glvov-bdai Sep 27, 2024
c53d987
add args
glvov-bdai Sep 30, 2024
50862cc
Merge branch 'isaac-sim:main' into feature/hyperparam_tune
glvov-bdai Oct 1, 2024
905bed1
formatting
glvov-bdai Oct 1, 2024
6909501
tweaks
glvov-bdai Oct 2, 2024
2577827
fix
glvov-bdai Oct 3, 2024
c21b2f5
allow jobs to actually get scheduled
glvov-bdai Oct 4, 2024
ceba315
add dockerfile
glvov-bdai Oct 4, 2024
7771439
formatting
glvov-bdai Oct 4, 2024
6563e1f
tweaks
glvov-bdai Oct 4, 2024
b27092f
get gcp cluster working with ray, and isaac
glvov-bdai Oct 7, 2024
b94fe87
make bash command consistent
glvov-bdai Oct 7, 2024
9a525ec
tweaks
glvov-bdai Oct 7, 2024
e6e9f85
formatting
glvov-bdai Oct 8, 2024
1187ca0
formatting
glvov-bdai Oct 8, 2024
83cb89a
fix argparser
glvov-bdai Oct 8, 2024
1885d0b
formatting
glvov-bdai Oct 8, 2024
653b8ae
cleanup command
glvov-bdai Oct 8, 2024
dc9fb3f
start argparser
glvov-bdai Oct 8, 2024
5f9f0dd
sync
glvov-bdai Oct 8, 2024
3cde9e4
Merge branch 'isaac-sim:main' into feature/hyperparam_tune
glvov-bdai Oct 8, 2024
db975bc
formatting
glvov-bdai Oct 8, 2024
c80d278
cherrypick ResNet Cart from PR
glvov-bdai Oct 8, 2024
7fd0169
add extra point in readme
glvov-bdai Oct 8, 2024
4dd48b1
add note about saving
glvov-bdai Oct 8, 2024
873ea54
fixes
glvov-bdai Oct 8, 2024
93fbff3
Merge branch 'isaac-sim:main' into feature/hyperparam_tune
glvov-bdai Oct 9, 2024
db88054
improve grokking ;)
glvov-bdai Oct 9, 2024
79b7c83
fix
glvov-bdai Oct 9, 2024
c81c2a3
formatting
glvov-bdai Oct 10, 2024
05644a4
Merge branch 'isaac-sim:main' into feature/hyperparam_tune
garylvov Oct 11, 2024
805c80b
Update README.md
garylvov Oct 11, 2024
aaf4d85
Revert "add feature extraction" | Don't need this for core stuff
glvov-bdai Oct 11, 2024
a01c1f4
Merge branch 'feature/hyperparam_tune' of https://github.com/glvov-bd…
glvov-bdai Oct 11, 2024
dad9a12
a life of merge conflicts for me
glvov-bdai Oct 11, 2024
d9a95d8
more merge conflict fixes lol
glvov-bdai Oct 11, 2024
cb01f19
update
glvov-bdai Oct 12, 2024
9970433
improve grokking
glvov-bdai Oct 13, 2024
56a57b4
split out cfg
glvov-bdai Oct 15, 2024
506c4b1
tweaks
glvov-bdai Oct 15, 2024
01a00ca
Merge branch 'isaac-sim:main' into feature/hyperparam_tune
glvov-bdai Oct 15, 2024
bc1a96a
bare bones
glvov-bdai Oct 15, 2024
d4f36b6
Merge branch 'feature/hyperparam_tune' of https://github.com/glvov-bd…
glvov-bdai Oct 15, 2024
9f029dd
tune bare bones, not sure if actually works yet
glvov-bdai Oct 17, 2024
a7d5f77
add repeater
glvov-bdai Oct 17, 2024
44777ab
Merge branch 'isaac-sim:main' into feature/hyperparam_tune
glvov-bdai Oct 21, 2024
d63fc37
further
glvov-bdai Oct 21, 2024
bd578e6
parse unknown
glvov-bdai Oct 22, 2024
e2e9f8d
add variable argparser
glvov-bdai Oct 22, 2024
71be507
shape up resource alloc
glvov-bdai Oct 22, 2024
d11232b
formatting
glvov-bdai Oct 22, 2024
f1460d0
QOL tweaks
glvov-bdai Oct 22, 2024
b0e63fd
basic local works
glvov-bdai Oct 22, 2024
85cd04b
Add live logging
glvov-bdai Oct 22, 2024
33b6794
Merge branch 'isaac-sim:main' into feature/hyperparam_tune
glvov-bdai Oct 22, 2024
6566218
little cleanup
glvov-bdai Oct 22, 2024
b4203b0
Merge branch 'feature/hyperparam_tune' of https://github.com/glvov-bd…
glvov-bdai Oct 22, 2024
25610b6
formatting
glvov-bdai Oct 22, 2024
f18c492
readme cleanup
glvov-bdai Oct 23, 2024
3edeab2
clarification
glvov-bdai Oct 23, 2024
93d92ef
add multi-cluster submission example
glvov-bdai Oct 23, 2024
b25f374
tweaks
glvov-bdai Oct 23, 2024
c4fa948
add per trial resources
glvov-bdai Oct 23, 2024
e6fdc38
slight cleanup
glvov-bdai Oct 24, 2024
065c130
readme update
glvov-bdai Oct 24, 2024
e0da556
Merge branch 'isaac-sim:main' into feature/hyperparam_tune
glvov-bdai Oct 24, 2024
9799015
optuna'
glvov-bdai Oct 24, 2024
7177de1
basic functionality works
glvov-bdai Oct 25, 2024
15a98ed
IsaacRay-v0 to the moon ;-)
glvov-bdai Oct 25, 2024
46d38aa
Merge branch 'isaac-sim:main' into feature/hyperparam_tune
glvov-bdai Oct 25, 2024
ddb1ea3
start conversion from readme to rst
glvov-bdai Oct 25, 2024
6f4627b
improve docs
glvov-bdai Oct 26, 2024
50a7e00
add heterogeneous cluster support
glvov-bdai Oct 28, 2024
62f8d15
add heterogeneous cluster support
glvov-bdai Oct 28, 2024
abb7002
convert more from readme to rst
glvov-bdai Oct 28, 2024
4196c8e
formatting
glvov-bdai Oct 28, 2024
551003b
Merge branch 'isaac-sim:main' into feature/hyperparam_tune
glvov-bdai Oct 28, 2024
8159106
move testing to individual steps
glvov-bdai Oct 28, 2024
e01310f
finish converting readme to rst
glvov-bdai Oct 28, 2024
0c20d44
Update source/standalone/workflows/ray/cluster_configs/Dockerfile
glvov-bdai Oct 29, 2024
cac728b
formatting and placement group
glvov-bdai Oct 29, 2024
884e04c
updates
glvov-bdai Oct 30, 2024
f5e94a4
clean up
glvov-bdai Oct 30, 2024
9dbf3f7
tweaks don't remember what lol
glvov-bdai Oct 30, 2024
8f63eee
fix cnn config and start of experiments
glvov-bdai Oct 31, 2024
352bde8
substitute bad google bucket hack with awesome mlflow logging ;)
glvov-bdai Oct 31, 2024
770e834
update docstrings and condense documentation
glvov-bdai Oct 31, 2024
03f193a
formatting and cleanup
glvov-bdai Oct 31, 2024
8a9d434
formatting and script to download results
glvov-bdai Nov 1, 2024
0b7ccb7
Tuning works on remote, with MLFLow, can vary CNN, MLP, and env count…
glvov-bdai Nov 1, 2024
adc78e1
add comment about ranges
glvov-bdai Nov 1, 2024
ac760f6
Merge branch 'main' into feature/hyperparam_tune
glvov-bdai Nov 1, 2024
1cc9d1c
remove spaces from f strings (my local pre-commit refused to catch th…
glvov-bdai Nov 1, 2024
5b99aec
correct port
glvov-bdai Nov 1, 2024
5bf15b4
decrease MLP/CNN sizes in tune to prevent out of memory issues
glvov-bdai Nov 1, 2024
a42f708
formatting
glvov-bdai Nov 1, 2024
d08733e
Merge branch 'main' into feature/hyperparam_tune
garylvov Nov 1, 2024
369fbdc
Fix install cmd
garylvov Nov 2, 2024
7dd0916
fix cmd line arg example for tune
garylvov Nov 2, 2024
81aed36
fix local tune, generalize job cfg to several workflows
garylvov Nov 2, 2024
fef135f
disable explicit checkpointing for local
garylvov Nov 2, 2024
4612baa
allow for local parallel tuning jobs
garylvov Nov 2, 2024
40d7441
fix indent level on doc
glvov-bdai Nov 3, 2024
a754a23
Merge branch 'feature/hyperparam_tune' of https://github.com/glvov-bd…
glvov-bdai Nov 3, 2024
371940e
fix tensorboard cmd in docs
glvov-bdai Nov 3, 2024
bb17a37
fix code block for extra deps
glvov-bdai Nov 3, 2024
5452e9e
Fix dockerfile
garylvov Nov 4, 2024
f9b1f40
fix error in documentation discovered during tutorial vid
glvov-bdai Nov 4, 2024
9800132
Merge branch 'feature/hyperparam_tune' of https://github.com/glvov-bd…
glvov-bdai Nov 4, 2024
0f6c484
Update docs/source/features/ray.rst
garylvov Nov 5, 2024
2d5e14d
Update source/standalone/workflows/ray/isaac_ray_tune.py
garylvov Nov 5, 2024
7c56a6b
Update source/standalone/workflows/ray/isaac_ray_tune.py
garylvov Nov 5, 2024
acd03c1
Update source/standalone/workflows/ray/isaac_ray_tune.py
garylvov Nov 5, 2024
75c4f49
Update docs/source/features/ray.rst
garylvov Nov 5, 2024
8311bc6
Update source/standalone/workflows/ray/isaac_ray_tune.py
garylvov Nov 5, 2024
9302017
Update source/standalone/workflows/ray/isaac_ray_tune.py
garylvov Nov 5, 2024
34afd2a
address james' comments
garylvov Nov 5, 2024
70aa958
delete old file and fix imports
garylvov Nov 5, 2024
5ba620d
format
garylvov Nov 5, 2024
80b5df5
change top level to be caps'
garylvov Nov 5, 2024
5027656
fix docstrings and typos
garylvov Nov 5, 2024
b293b48
Merge branch 'main' into feature/hyperparam_tune
glvov-bdai Nov 5, 2024
aa92a9d
fix weird bolding thing
glvov-bdai Nov 5, 2024
34f0908
fix emphasize lines and included files in rst
glvov-bdai Nov 5, 2024
0856cd6
Merge branch 'main' into feature/hyperparam_tune
glvov-bdai Nov 7, 2024
7cc587a
Merge branch 'main' into feature/hyperparam_tune
glvov-bdai Nov 8, 2024
a38de8e
Merge branch 'main' into feature/hyperparam_tune
garylvov Nov 18, 2024
30a63ff
Merge branch 'main' into feature/hyperparam_tune
glvov-bdai Nov 22, 2024
ad8161d
Merge branch 'main' into feature/hyperparam_tune
glvov-bdai Nov 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ Table of Contents
source/features/hydra
source/features/multi_gpu
source/features/tiled_rendering
source/features/ray
source/features/reproducibility

.. toctree::
Expand Down
395 changes: 395 additions & 0 deletions docs/source/features/ray.rst

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions source/standalone/workflows/ray/cluster_configs/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# This dockerfile only works because of Felix Yu's help #TODO: Add Felix to contributors list
glvov-bdai marked this conversation as resolved.
Show resolved Hide resolved
FROM isaac-lab-base:latest
ENV PATH="/usr/local/nvidia/bin:$PATH"
ENV LD_LIBRARY_PATH="/usr/local/nvidia/lib64"
RUN ln -sf /usr/local/nvidia/bin/nvidia* /usr/bin
RUN /workspace/isaaclab/_isaac_sim/python.sh -m pip install "ray[default, tune]"==2.31.0 && \
sed -i "1i $(echo "#!/workspace/isaaclab/_isaac_sim/python.sh")" \
/isaac-sim/kit/python/bin/ray && ln -s /isaac-sim/kit/python/bin/ray /usr/local/bin/ray
# The following is only needed for tuning
RUN /workspace/isaaclab/_isaac_sim/python.sh -m pip install optuna bayesian-optimization
glvov-bdai marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Ray on Google Cloud with Isaac Lab

For more info, see
https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/ray-on-gke/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
# Jinja is used for templating here as full helm setup is excessive for application
apiVersion: ray.io/v1alpha1
kind: RayCluster
metadata:
name: {{ name }}
namespace: {{ namespace }}
spec:
rayVersion: "2.8.0"
enableInTreeAutoscaling: true
autoscalerOptions:
upscalingMode: Default
idleTimeoutSeconds: 120
imagePullPolicy: Always
securityContext: {}
envFrom: []

headGroupSpec:
rayStartParams:
block: "true"
dashboard-host: 0.0.0.0
dashboard-port: "8265"
node-ip-address: "0.0.0.0"
port: "6379"
include-dashboard: "true"
ray-debugger-external: "true"
object-manager-port: "8076"
num-gpus: "0"
num-cpus: "0" # prevent scheduling jobs to the head node - workers only
headService:
apiVersion: v1
kind: Service
metadata:
name: head
spec:
type: LoadBalancer
template:
metadata:
labels:
app.kubernetes.io/instance: tuner
app.kubernetes.io/name: kuberay
cloud.google.com/gke-ray-node-type: head
spec:
serviceAccountName: {{ service_account_name }}
affinity: {}
securityContext:
fsGroup: 100
containers:
- env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: "/var/secrets/{{secret_name}}/key.json"
image: {{ image }}
imagePullPolicy: Always
name: head
resources:
limits:
cpu: "{{ num_head_cpu }}"
memory: {{ head_ram_gb }}G
nvidia.com/gpu: "0"
requests:
cpu: "{{ num_head_cpu }}"
memory: {{ head_ram_gb }}G
nvidia.com/gpu: "0"
securityContext: {}
volumeMounts:
- mountPath: /tmp/ray
name: ray-logs
- mountPath: /var/secrets/{{secret_name}}
name: {{secret_name}}
readOnly: true
command: ["/bin/bash", "-c", "ray start --head --port=6379 --object-manager-port=8076 --dashboard-host=0.0.0.0 --dashboard-port=8265 --include-dashboard=true && tail -f /dev/null"]
- image: fluent/fluent-bit:1.9.6
name: fluentbit
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
volumeMounts:
- mountPath: /tmp/ray
name: ray-logs
imagePullSecrets: []
nodeSelector:
iam.gke.io/gke-metadata-server-enabled: "true"
volumes:
- configMap:
name: fluentbit-config
name: fluentbit-config
- name: ray-logs
emptyDir: {}
- name: {{secret_name}}
secret:
secretName: {{secret_name}}

workerGroupSpecs:
{% for it in range(gpu_per_worker|length) %}
- groupName: "{{ worker_accelerator[it] }}x{{ gpu_per_worker[it] }}-cpu-{{ cpu_per_worker[it] }}-ram-gb-{{ ram_gb_per_worker[it] }}"
replicas: {{ num_workers[it] }}
maxReplicas: {{ num_workers[it] }}
minReplicas: {{ num_workers[it] }}
rayStartParams:
block: "true"
ray-debugger-external: "true"
replicas: "{{num_workers[it]}}"
template:
metadata:
annotations: {}
labels:
app.kubernetes.io/instance: tuner
app.kubernetes.io/name: kuberay
cloud.google.com/gke-ray-node-type: worker
spec:
serviceAccountName: {{ service_account_name }}
affinity: {}
securityContext:
fsGroup: 100
containers:
- env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: "/var/secrets/{{secret_name}}/key.json"
- name: NVIDIA_VISIBLE_DEVICES
value: "all"
- name: NVIDIA_DRIVER_CAPABILITIES
value: "compute,utility"

image: {{ image }}
imagePullPolicy: Always
name: ray-worker
resources:
limits:
cpu: "{{ cpu_per_worker[it] }}"
memory: {{ ram_gb_per_worker[it] }}G
nvidia.com/gpu: "{{ gpu_per_worker[it] }}"
requests:
cpu: "{{ cpu_per_worker[it] }}"
memory: {{ ram_gb_per_worker[it] }}G
nvidia.com/gpu: "{{ gpu_per_worker[it] }}"
securityContext: {}
volumeMounts:
- mountPath: /tmp/ray
name: ray-logs
- mountPath: /var/secrets/{{secret_name}}
name: {{secret_name}}
readOnly: true
command: ["/bin/bash", "-c", "ray start --address=head.{{ namespace }}.svc.cluster.local:6379 && tail -f /dev/null"]
- image: fluent/fluent-bit:1.9.6
name: fluentbit
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
volumeMounts:
- mountPath: /tmp/ray
name: ray-logs

imagePullSecrets: []
nodeSelector:
cloud.google.com/gke-accelerator: {{ worker_accelerator[it] }}
iam.gke.io/gke-metadata-server-enabled: "true"
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
volumes:
- configMap:
name: fluentbit-config
name: fluentbit-config
- name: ray-logs
emptyDir: {}
- name: {{secret_name}}
secret:
secretName: {{secret_name}}
{% endfor %}
180 changes: 180 additions & 0 deletions source/standalone/workflows/ray/grok_cluster_with_kubectl.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
# Copyright (c) 2022-2024, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause

import argparse
import os
import re
import subprocess
import threading
import time
from concurrent.futures import ThreadPoolExecutor, as_completed

"""
This script requires that kubectl is installed and KubeRay was used to create the cluster.

Creates a config file containing ``name: <NAME> address: http://<IP>:<PORT>`` on
a new line for each cluster.

Usage:

.. code-block:: bash


./isaaclab.sh -p source/standalone/workflows/ray/grok_cluster_with_kubectl.py

# For options, supply -h arg
"""


def get_pods(namespace: str = "default") -> list[tuple]:
cmd = ["kubectl", "get", "pods", "-n", namespace, "--no-headers"]
output = subprocess.check_output(cmd).decode()
pods = []
for line in output.strip().split("\n"):
fields = line.split()
pod_name = fields[0]
status = fields[2]
pods.append((pod_name, status))
return pods


def get_clusters(pods: list, cluster_name_prefix: str) -> set:
clusters = set()
# Modify regex pattern to match the entire structure including `-head` or `-worker`
for pod_name, _ in pods:
match = re.match(r"(" + re.escape(cluster_name_prefix) + r"[-\w]+)", pod_name)
if match:
clusters.add(match.group(1).split("-head")[0].split("-worker")[0])
return sorted(clusters)


def check_clusters_running(pods: list, clusters: set) -> bool:
glvov-bdai marked this conversation as resolved.
Show resolved Hide resolved
clusters_running = True
for cluster in clusters:
cluster_pods = [p for p in pods if p[0].startswith(cluster)]
total_pods = len(cluster_pods)
running_pods = len([p for p in cluster_pods if p[1] == "Running"])
if running_pods != total_pods:
clusters_running = False
break
return clusters_running
glvov-bdai marked this conversation as resolved.
Show resolved Hide resolved


def get_ray_address(head_pod: str, namespace: str = "default", ray_head_name: str = "head") -> str:
glvov-bdai marked this conversation as resolved.
Show resolved Hide resolved
cmd = ["kubectl", "logs", head_pod, "-c", ray_head_name, "-n", namespace]
try:
output = subprocess.check_output(cmd).decode()
except subprocess.CalledProcessError as e:
raise ValueError(
f"Could not enter head container with cmd {cmd}: {e}Perhaps try a different namespace or ray head name."
)
match = re.search(r"RAY_ADDRESS='([^']+)'", output)
if match:
return match.group(1)
else:
return None


def process_cluster(cluster_info: dict, ray_head_name: str = "head") -> str:
glvov-bdai marked this conversation as resolved.
Show resolved Hide resolved
cluster, pods, namespace = cluster_info
head_pod = None
for pod_name, status in pods:
if pod_name.startswith(cluster + "-head"):
head_pod = pod_name
break
if not head_pod:
return f"Error: Could not find head pod for cluster {cluster}\n"

# Get RAY_ADDRESS and status
ray_address = get_ray_address(head_pod, namespace=namespace, ray_head_name=ray_head_name)
if not ray_address:
return f"Error: Could not find RAY_ADDRESS for cluster {cluster}\n"
output_line = ( # num_cpu: {num_cpu} num_gpu: {num_gpu} ram_gb: {ram_gb} total_workers: {total_workers}\n"
f"name: {cluster} address: {ray_address} \n"
)
return output_line
glvov-bdai marked this conversation as resolved.
Show resolved Hide resolved


def main():
# Parse command-line arguments
parser = argparse.ArgumentParser(description="Process Ray clusters and save their specifications.")
parser.add_argument("--prefix", default="isaacray", help="The prefix for the cluster names.")
parser.add_argument("--output", default="~/.cluster_config", help="The file to save cluster specifications.")
parser.add_argument("--ray_head_name", default="head", help="The metadata name for the ray head container")
args = parser.parse_args()

CLUSTER_NAME_PREFIX = args.prefix
glvov-bdai marked this conversation as resolved.
Show resolved Hide resolved
# Expand user directory for output file
CLUSTER_SPEC_FILE = os.path.expanduser(args.output)

# Get current namespace
try:
CURRENT_NAMESPACE = (
subprocess.check_output(["kubectl", "config", "view", "--minify", "--output", "jsonpath={..namespace}"])
.decode()
.strip()
)
if not CURRENT_NAMESPACE:
CURRENT_NAMESPACE = "default"
except subprocess.CalledProcessError:
CURRENT_NAMESPACE = "default"
print(f"Using namespace: {CURRENT_NAMESPACE}")

# Get all pods
pods = get_pods(namespace=CURRENT_NAMESPACE)

# Get clusters
clusters = get_clusters(pods, CLUSTER_NAME_PREFIX)
if not clusters:
print(f"No clusters found with prefix {CLUSTER_NAME_PREFIX}")
return

# Wait for clusters to be running
while True:
pods = get_pods(namespace=CURRENT_NAMESPACE) # Refresh pods list inside loop
if check_clusters_running(pods, clusters):
break
print("Waiting for all clusters to spin up...")
time.sleep(5)

# Prepare cluster info for parallel processing
cluster_infos = []
for cluster in clusters:
cluster_pods = [p for p in pods if p[0].startswith(cluster)]
cluster_infos.append((cluster, cluster_pods, CURRENT_NAMESPACE))

# Use ThreadPoolExecutor to process clusters in parallel
results = []
results_lock = threading.Lock() # Create a lock for thread-safe results collection

with ThreadPoolExecutor() as executor:
future_to_cluster = {
executor.submit(process_cluster, info, args.ray_head_name): info[0] for info in cluster_infos
}
for future in as_completed(future_to_cluster):
cluster_name = future_to_cluster[future]
try:
result = future.result()
with results_lock:
results.append(result)
except Exception as exc:
print(f"{cluster_name} generated an exception: {exc}")

# Sort results alphabetically by cluster name
results.sort()

# Write sorted results to the output file
with open(CLUSTER_SPEC_FILE, "w") as f:
for result in results:
f.write(result)

print(f"Cluster spec information saved to {CLUSTER_SPEC_FILE}")
# Display the contents of the config file
with open(CLUSTER_SPEC_FILE) as f:
print(f.read())


if __name__ == "__main__":
main()
Loading
Loading