FIX: Show non common GPUs by default when querying specific clouds #3925

wizenheimer · 2024-09-07T20:25:59Z

Issues Addressed

[show-gpus] Show non-common GPUs by default when querying specific clouds #3828

Changes Made

In the "Other GPUs" section of the CLI, modified the condition so that all GPUs (including non-common ones) are shown when a specific cloud is specified.

Tested (run the relevant ones):

Code formatting: bash format.sh
Any manual or new tests for this PR (please specify below)
All smoke tests: pytest tests/test_smoke.py
Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh

Before

sky show-gpus --cloud cudo
COMMON_GPU  AVAILABLE_QUANTITIES
V100        1, 2, 4, 8

Hint: use -a/--all to see all accelerators (including non-common ones) and pricing.

After

sky show-gpus --cloud cudo
COMMON_GPU  AVAILABLE_QUANTITIES
V100        1, 2, 4, 8

OTHER_GPU  AVAILABLE_QUANTITIES
A40        1, 2, 4, 8
RTX3080    1
RTXA4000   1, 2
RTXA4500   1, 2
RTXA5000   1, 2
RTXA6000   1, 2

GPU  QTY  CLOUD  INSTANCE_TYPE            DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION
A40  1    Cudo   ice-lake-a40_4x1v2gb     48GB        -      4GB       $ 0.808       -                  se-stockholm-1
A40  1    Cudo   ice-lake-a40_8x1v4gb     48GB        -      8GB       $ 0.827       -                  se-stockholm-1
A40  1    Cudo   ice-lake-a40_24x1v12gb   48GB        -      24GB      $ 0.900       -                  se-stockholm-1
A40  2    Cudo   ice-lake-a40_16x2v8gb    48GB        -      16GB      $ 1.654       -                  se-stockholm-1
A40  2    Cudo   ice-lake-a40_32x2v16gb   48GB        -      32GB      $ 1.727       -                  se-stockholm-1
A40  2    Cudo   ice-lake-a40_48x2v24gb   48GB        -      48GB      $ 1.801       -                  se-stockholm-1
A40  4    Cudo   ice-lake-a40_64x4v32gb   48GB        -      64GB      $ 3.454       -                  se-stockholm-1
A40  4    Cudo   ice-lake-a40_96x4v48gb   48GB        -      96GB      $ 3.602       -                  se-stockholm-1
A40  8    Cudo   ice-lake-a40_128x8v64gb  48GB        -      128GB     $ 6.909       -                  se-stockholm-1
A40  8    Cudo   ice-lake-a40_192x8v96gb  48GB        -      192GB     $ 7.203       -                  se-stockholm-1

GPU      QTY  CLOUD  INSTANCE_TYPE                       DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION
RTX3080  1    Cudo   intel-broadwell-rtx-3080_4x1v2gb    12GB        -      4GB       $ 0.082       -                  ca-montreal-1
RTX3080  1    Cudo   intel-broadwell-rtx-3080_8x1v4gb    12GB        -      8GB       $ 0.094       -                  ca-montreal-1
RTX3080  1    Cudo   intel-broadwell-rtx-3080_24x1v12gb  12GB        -      24GB      $ 0.143       -                  ca-montreal-1
RTX3080  1    Cudo   intel-broadwell-rtx-3080_48x1v24gb  12GB        -      48GB      $ 0.216       -                  ca-montreal-1

GPU       QTY  CLOUD  INSTANCE_TYPE                   DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION
RTXA4000  1    Cudo   epyc-rome-rtx-a4000_4x1v2gb     16GB        -      4GB       $ 0.308       -                  no-luster-1
RTXA4000  1    Cudo   epyc-milan-rtx-a4000_4x1v2gb    16GB        -      4GB       $ 0.318       -                  se-smedjebacken-1
RTXA4000  1    Cudo   epyc-rome-rtx-a4000_8x1v4gb     16GB        -      8GB       $ 0.326       -                  no-luster-1
RTXA4000  1    Cudo   epyc-milan-rtx-a4000_8x1v4gb    16GB        -      8GB       $ 0.337       -                  se-smedjebacken-1
RTXA4000  1    Cudo   epyc-rome-rtx-a4000_24x1v12gb   16GB        -      24GB      $ 0.397       -                  no-luster-1
RTXA4000  1    Cudo   epyc-milan-rtx-a4000_24x1v12gb  16GB        -      24GB      $ 0.410       -                  se-smedjebacken-1
RTXA4000  1    Cudo   epyc-milan-rtx-a4000_48x1v24gb  16GB        -      48GB      $ 0.521       -                  se-smedjebacken-1
RTXA4000  2    Cudo   epyc-milan-rtx-a4000_16x2v8gb   16GB        -      16GB      $ 0.674       -                  se-smedjebacken-1
RTXA4000  2    Cudo   epyc-milan-rtx-a4000_32x2v16gb  16GB        -      32GB      $ 0.747       -                  se-smedjebacken-1
RTXA4000  2    Cudo   epyc-milan-rtx-a4000_48x2v24gb  16GB        -      48GB      $ 0.821       -                  se-smedjebacken-1
RTXA4000  2    Cudo   epyc-milan-rtx-a4000_96x2v48gb  16GB        -      96GB      $ 1.042       -                  se-smedjebacken-1

GPU       QTY  CLOUD  INSTANCE_TYPE             DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION
RTXA4500  1    Cudo   sky-lake-a4500_4x1v2gb    20GB        -      4GB       $ 0.478       -                  gb-london-1
RTXA4500  1    Cudo   sky-lake-a4500_8x1v4gb    20GB        -      8GB       $ 0.495       -                  gb-london-1
RTXA4500  1    Cudo   sky-lake-a4500_24x1v12gb  20GB        -      24GB      $ 0.566       -                  gb-london-1
RTXA4500  1    Cudo   sky-lake-a4500_48x1v24gb  20GB        -      48GB      $ 0.671       -                  gb-london-1
RTXA4500  2    Cudo   sky-lake-a4500_16x2v8gb   20GB        -      16GB      $ 0.990       -                  gb-london-1
RTXA4500  2    Cudo   sky-lake-a4500_32x2v16gb  20GB        -      32GB      $ 1.061       -                  gb-london-1
RTXA4500  2    Cudo   sky-lake-a4500_48x2v24gb  20GB        -      48GB      $ 1.131       -                  gb-london-1
RTXA4500  2    Cudo   sky-lake-a4500_96x2v48gb  20GB        -      96GB      $ 1.342       -                  gb-london-1

GPU       QTY  CLOUD  INSTANCE_TYPE                   DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION
RTXA5000  1    Cudo   epyc-milan-rtx-a5000_4x1v2gb    24GB        -      4GB       $ 0.568       -                  se-smedjebacken-1
RTXA5000  1    Cudo   epyc-milan-rtx-a5000_8x1v4gb    24GB        -      8GB       $ 0.587       -                  se-smedjebacken-1
RTXA5000  1    Cudo   epyc-milan-rtx-a5000_24x1v12gb  24GB        -      24GB      $ 0.660       -                  se-smedjebacken-1
RTXA5000  1    Cudo   epyc-milan-rtx-a5000_48x1v24gb  24GB        -      48GB      $ 0.771       -                  se-smedjebacken-1
RTXA5000  2    Cudo   epyc-milan-rtx-a5000_16x2v8gb   24GB        -      16GB      $ 1.174       -                  se-smedjebacken-1
RTXA5000  2    Cudo   epyc-milan-rtx-a5000_32x2v16gb  24GB        -      32GB      $ 1.247       -                  se-smedjebacken-1
RTXA5000  2    Cudo   epyc-milan-rtx-a5000_48x2v24gb  24GB        -      48GB      $ 1.321       -                  se-smedjebacken-1
RTXA5000  2    Cudo   epyc-milan-rtx-a5000_96x2v48gb  24GB        -      96GB      $ 1.542       -                  se-smedjebacken-1

GPU       QTY  CLOUD  INSTANCE_TYPE                  DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION
RTXA6000  1    Cudo   epyc-rome-rtx-a6000_4x1v2gb    48GB        -      4GB       $ 0.798       -                  no-luster-1
RTXA6000  1    Cudo   epyc-rome-rtx-a6000_8x1v4gb    48GB        -      8GB       $ 0.816       -                  no-luster-1
RTXA6000  1    Cudo   epyc-rome-rtx-a6000_24x1v12gb  48GB        -      24GB      $ 0.887       -                  no-luster-1
RTXA6000  1    Cudo   epyc-rome-rtx-a6000_48x1v24gb  48GB        -      48GB      $ 0.994       -                  no-luster-1
RTXA6000  2    Cudo   epyc-rome-rtx-a6000_16x2v8gb   48GB        -      16GB      $ 1.631       -                  no-luster-1
RTXA6000  2    Cudo   epyc-rome-rtx-a6000_32x2v16gb  48GB        -      32GB      $ 1.702       -                  no-luster-1
RTXA6000  2    Cudo   epyc-rome-rtx-a6000_48x2v24gb  48GB        -      48GB      $ 1.774       -                  no-luster-1
RTXA6000  2    Cudo   epyc-rome-rtx-a6000_96x2v48gb  48GB        -      96GB      $ 1.987       -                  no-luster-1

GPU   QTY  CLOUD  INSTANCE_TYPE                    DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION
V100  1    Cudo   intel-broadwell-v100_4x1v2gb     16GB        -      4GB       $ 1.008       -                  us-santaclara-1
V100  1    Cudo   intel-broadwell-v100_8x1v4gb     16GB        -      8GB       $ 1.027       -                  us-santaclara-1
V100  1    Cudo   intel-broadwell-v100_24x1v12gb   16GB        -      24GB      $ 1.100       -                  us-santaclara-1
V100  2    Cudo   intel-broadwell-v100_16x2v8gb    16GB        -      16GB      $ 2.054       -                  us-santaclara-1
V100  2    Cudo   intel-broadwell-v100_32x2v16gb   16GB        -      32GB      $ 2.127       -                  us-santaclara-1
V100  2    Cudo   intel-broadwell-v100_48x2v24gb   16GB        -      48GB      $ 2.201       -                  us-santaclara-1
V100  4    Cudo   intel-broadwell-v100_64x4v32gb   16GB        -      64GB      $ 4.254       -                  us-santaclara-1
V100  4    Cudo   intel-broadwell-v100_96x4v48gb   16GB        -      96GB      $ 4.402       -                  us-santaclara-1
V100  8    Cudo   intel-broadwell-v100_128x8v64gb  16GB        -      128GB     $ 8.509       -                  us-santaclara-1
V100  8    Cudo   intel-broadwell-v100_192x8v96gb  16GB        -      192GB     $ 8.803       -                  us-santaclara-1

romilbhardwaj · 2024-09-10T01:06:09Z

Thanks @wizenheimer. I think the desired behavior is to print only COMMON_GPU and OTHER_GPU, not the entire pricing table. I.e.:

sky show-gpus --cloud cudo
COMMON_GPU  AVAILABLE_QUANTITIES
V100        1, 2, 4, 8

OTHER_GPU  AVAILABLE_QUANTITIES
A40        1, 2, 4, 8
RTX3080    1
RTXA4000   1, 2
RTXA4500   1, 2
RTXA5000   1, 2
RTXA6000   1, 2

wizenheimer · 2024-09-10T03:55:11Z

Hey @romilbhardwaj,
Quick follow up. Please have a look.

Current Change

$ sky show-gpus --cloud cudo
COMMON_GPU  AVAILABLE_QUANTITIES
V100        1, 2, 4, 8

OTHER_GPU  AVAILABLE_QUANTITIES
A40        1, 2, 4, 8
RTX3080    1
RTXA4000   1, 2
RTXA4500   1, 2
RTXA5000   1, 2
RTXA6000   1, 2

$ sky show-gpus --cloud runpod
COMMON_GPU  AVAILABLE_QUANTITIES
A100-80GB   1, 2, 4, 8
H100        1, 2, 4, 8
L4          1, 2, 4, 8

OTHER_GPU      AVAILABLE_QUANTITIES
A100-80GB-SXM  1, 2, 4, 8
A40            1, 2, 4, 8
H100-SXM       1, 2, 4, 8
L40            1, 2, 4, 8
RTX3090        1, 2, 4, 8
RTX4000-Ada    1, 2, 4, 8
RTX4090        1, 2, 4, 8
RTX6000-Ada    1, 2, 4, 8
RTXA4000       1, 2, 4, 8
RTXA4500       1, 2, 4, 8
RTXA5000       1, 2, 4, 8
RTXA6000       1, 2, 4, 8

Others (consistent with master)

$ sky show-gpus -a
COMMON_GPU  AVAILABLE_QUANTITIES
A10         1, 2, 4
A10G        1, 4, 8
A100        1, 2, 4, 8, 16
A100-80GB   1, 2, 4, 8
H100        1, 2, 4, 8, 12
K80         1, 2, 4, 8, 16
L4          1, 2, 4, 8
M60         1, 2, 4
P100        1, 2, 4
T4          1, 2, 4, 8
V100        1, 2, 4, 8
V100-32GB   1, 2, 4, 8

GOOGLE_TPU   AVAILABLE_QUANTITIES
tpu-v2-8     1
tpu-v2-32    1
tpu-v2-128   1
tpu-v2-256   1
tpu-v2-512   1
tpu-v3-8     1
tpu-v3-32    1
tpu-v3-64    1
tpu-v3-128   1
tpu-v3-256   1
tpu-v3-512   1
tpu-v3-1024  1
tpu-v3-2048  1
...
GPU   QTY  CLOUD   INSTANCE_TYPE        DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE
A100  1    Lambda  gpu_1x_a100          40GB        30     200GB     $ 1.290       -
A100  1    Lambda  gpu_1x_a100_sxm4     40GB        30     200GB     $ 1.290       -
A100  2    Lambda  gpu_2x_a100          40GB        60     400GB     $ 2.580       -
A100  4    Lambda  gpu_4x_a100          40GB        120    800GB     $ 5.160       -
A100  8    Lambda  gpu_8x_a100          40GB        124    1800GB    $ 10.320      -
A100  1    GCP     a2-highgpu-1g        -           12     85GB      $ 3.673       $ 1.469
A100  2    GCP     a2-highgpu-2g        -           24     170GB     $ 7.347       $ 2.939
A100  4    GCP     a2-highgpu-4g        -           48     340GB     $ 14.694      $ 5.877
A100  8    GCP     a2-highgpu-8g        -           96     680GB     $ 29.387      $ 11.755
A100  16   GCP     a2-megagpu-16g       -           96     1360GB    $ 55.740      $ 22.296
A100  8    OCI     BM.GPU4.8            40GB        128    2048GB    $ 24.400      -
A100  8    Azure   Standard_ND96asr_v4  -           96     900GB     $ 27.197      $ 2.992
A100  8    AWS     p4d.24xlarge         40GB        96     1152GB    $ 32.773      $ 11.158
...
GPU        QTY  CLOUD       INSTANCE_TYPE              DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE
A100-80GB  1    RunPod      1x_A100-80GB_SECURE        -           8      80GB      $ 1.990       -
A100-80GB  2    RunPod      2x_A100-80GB_SECURE        -           16     160GB     $ 3.980       -
A100-80GB  4    RunPod      4x_A100-80GB_SECURE        -           32     320GB     $ 7.960       -
A100-80GB  8    RunPod      8x_A100-80GB_SECURE        -           64     640GB     $ 15.920      -
A100-80GB  1    Paperspace  A100-80G                   -           12     80GB      $ 3.180       -
A100-80GB  8    Paperspace  A100-80Gx8                 -           96     640GB     $ 25.440      -
A100-80GB  2    Fluidstack  A100_PCIE_80GB::2          80GB        60     240GB     $ 3.500       -
A100-80GB  4    Fluidstack  A100_PCIE_80GB::4          80GB        124    480GB     $ 7.000       -
A100-80GB  8    Fluidstack  A100_PCIE_80GB::8          80GB        252    1440GB    $ 14.000      -
A100-80GB  1    Azure       Standard_NC24ads_A100_v4   -           24     220GB     $ 3.673       $ 0.404
A100-80GB  2    Azure       Standard_NC48ads_A100_v4   -           48     440GB     $ 7.346       $ 0.808
A100-80GB  4    Azure       Standard_NC96ads_A100_v4   -           96     880GB     $ 14.692      $ 1.616
A100-80GB  8    Azure       Standard_ND96amsr_A100_v4  -           96     1800GB    $ 32.770      $ 3.605

romilbhardwaj

Thanks for the great work @wizenheimer! Tried it out and left some comments.

romilbhardwaj · 2024-09-15T05:35:37Z

sky/cli.py

+            # Handle k8 messages if present
+            if k8s_messages:
+                yield '\n'
+                yield k8s_messages


We want to keep this under the if block above, else k8s_messages are printed twice when sky show-gpus -a is run.

romilbhardwaj · 2024-09-15T05:37:48Z

sky/provision/__init__.py

@@ -186,12 +186,12 @@ def get_cluster_info(
 def get_command_runners(
    provider_name: str,
    cluster_info: common.ClusterInfo,
-    **crednetials: Dict[str, Any],
+    **credentials: Dict[str, Any],


Thanks for fixing this in #3924 :) perhaps this branch needs to rebased/merged with latest?

romilbhardwaj · 2024-09-15T05:42:22Z

sky/cli.py

-            else:
+
+            # Handle hints and messages
+            if not show_all and cloud is None:


We may need to do some special handling here. Refer to the master branch output and output in this branch below.

We want to retain the hint at the end, but exclude the (including non-common ones) bit. I.e., we should print a hint like Hint: use -a/--all to see all accelerators and pricing. when sky show-gpus --cloud xyz is run, and the current hint Hint: use -a/--all to see all accelerators (including non-common ones) and pricing. when sky show-gpus is run.

There's some extra empty lines at the end of this branch's output.

Master:

(base) ➜ ~ sky show-gpus --cloud aws COMMON_GPU AVAILABLE_QUANTITIES A10G 1, 4, 8 A100 8 A100-80GB 8 H100 8 K80 1, 8, 16 L4 1, 4, 8 M60 1, 2, 4 T4 1, 4, 8 V100 1, 4, 8 V100-32GB 8 Hint: use -a/--all to see all accelerators (including non-common ones) and pricing.

This branch:

(base) ➜ ~ sky show-gpus --cloud aws COMMON_GPU AVAILABLE_QUANTITIES A10G 1, 4, 8 A100 8 A100-80GB 8 H100 8 K80 1, 8, 16 L4 1, 4, 8 M60 1, 2, 4 T4 1, 4, 8 V100 1, 4, 8 V100-32GB 8 OTHER_GPU AVAILABLE_QUANTITIES Gaudi HL-205 8 L40S 1, 4, 8 Radeon Pro V520 1, 2, 4 T4g 1, 2

wizenheimer · 2024-09-16T19:09:17Z

Thanks @romilbhardwaj,
Really appreciate the feedback. Updated the CLI's footnotes and handled duplicated K8 messages.
Here's a diff for reference:

sky show-gpus -a - CLI Diff
sky show-gpus --cloud aws - CLI Diff
sky show-gpus - CLI Diff

wizenheimer added 2 commits September 8, 2024 01:28

fix: resolve param credential typo

9b84b53

fix: show non-common gpus

8268351

wizenheimer mentioned this pull request Sep 7, 2024

[show-gpus] Show non-common GPUs by default when querying specific clouds #3828

Open

fix: remove inadvertent price table when showing non common gpu

eba6150

romilbhardwaj reviewed Sep 15, 2024

View reviewed changes

wizenheimer added 3 commits September 16, 2024 21:45

Merge branch 'master' into fix/show-non-common-gpus

e2d5019

feat: update footnotes for cloud specific release

14257c6

fix: resolve black formatting

a7f97fe

wizenheimer requested a review from romilbhardwaj September 16, 2024 19:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: Show non common GPUs by default when querying specific clouds #3925

FIX: Show non common GPUs by default when querying specific clouds #3925

wizenheimer commented Sep 7, 2024 •

edited

Loading

romilbhardwaj commented Sep 10, 2024

wizenheimer commented Sep 10, 2024

romilbhardwaj left a comment

romilbhardwaj Sep 15, 2024

romilbhardwaj Sep 15, 2024

romilbhardwaj Sep 15, 2024

wizenheimer commented Sep 16, 2024

FIX: Show non common GPUs by default when querying specific clouds #3925

Are you sure you want to change the base?

FIX: Show non common GPUs by default when querying specific clouds #3925

Conversation

wizenheimer commented Sep 7, 2024 • edited Loading

romilbhardwaj commented Sep 10, 2024

wizenheimer commented Sep 10, 2024

romilbhardwaj left a comment

Choose a reason for hiding this comment

romilbhardwaj Sep 15, 2024

Choose a reason for hiding this comment

romilbhardwaj Sep 15, 2024

Choose a reason for hiding this comment

romilbhardwaj Sep 15, 2024

Choose a reason for hiding this comment

wizenheimer commented Sep 16, 2024

wizenheimer commented Sep 7, 2024 •

edited

Loading