Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: Show non common GPUs by default when querying specific clouds #3925

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

wizenheimer
Copy link
Contributor

@wizenheimer wizenheimer commented Sep 7, 2024

Issues Addressed

Changes Made

  • In the "Other GPUs" section of the CLI, modified the condition so that all GPUs (including non-common ones) are shown when a specific cloud is specified.

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh

Before

sky show-gpus --cloud cudo
COMMON_GPU  AVAILABLE_QUANTITIES
V100        1, 2, 4, 8

Hint: use -a/--all to see all accelerators (including non-common ones) and pricing.

After

sky show-gpus --cloud cudo
COMMON_GPU  AVAILABLE_QUANTITIES
V100        1, 2, 4, 8

OTHER_GPU  AVAILABLE_QUANTITIES
A40        1, 2, 4, 8
RTX3080    1
RTXA4000   1, 2
RTXA4500   1, 2
RTXA5000   1, 2
RTXA6000   1, 2

GPU  QTY  CLOUD  INSTANCE_TYPE            DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION
A40  1    Cudo   ice-lake-a40_4x1v2gb     48GB        -      4GB       $ 0.808       -                  se-stockholm-1
A40  1    Cudo   ice-lake-a40_8x1v4gb     48GB        -      8GB       $ 0.827       -                  se-stockholm-1
A40  1    Cudo   ice-lake-a40_24x1v12gb   48GB        -      24GB      $ 0.900       -                  se-stockholm-1
A40  2    Cudo   ice-lake-a40_16x2v8gb    48GB        -      16GB      $ 1.654       -                  se-stockholm-1
A40  2    Cudo   ice-lake-a40_32x2v16gb   48GB        -      32GB      $ 1.727       -                  se-stockholm-1
A40  2    Cudo   ice-lake-a40_48x2v24gb   48GB        -      48GB      $ 1.801       -                  se-stockholm-1
A40  4    Cudo   ice-lake-a40_64x4v32gb   48GB        -      64GB      $ 3.454       -                  se-stockholm-1
A40  4    Cudo   ice-lake-a40_96x4v48gb   48GB        -      96GB      $ 3.602       -                  se-stockholm-1
A40  8    Cudo   ice-lake-a40_128x8v64gb  48GB        -      128GB     $ 6.909       -                  se-stockholm-1
A40  8    Cudo   ice-lake-a40_192x8v96gb  48GB        -      192GB     $ 7.203       -                  se-stockholm-1

GPU      QTY  CLOUD  INSTANCE_TYPE                       DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION
RTX3080  1    Cudo   intel-broadwell-rtx-3080_4x1v2gb    12GB        -      4GB       $ 0.082       -                  ca-montreal-1
RTX3080  1    Cudo   intel-broadwell-rtx-3080_8x1v4gb    12GB        -      8GB       $ 0.094       -                  ca-montreal-1
RTX3080  1    Cudo   intel-broadwell-rtx-3080_24x1v12gb  12GB        -      24GB      $ 0.143       -                  ca-montreal-1
RTX3080  1    Cudo   intel-broadwell-rtx-3080_48x1v24gb  12GB        -      48GB      $ 0.216       -                  ca-montreal-1

GPU       QTY  CLOUD  INSTANCE_TYPE                   DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION
RTXA4000  1    Cudo   epyc-rome-rtx-a4000_4x1v2gb     16GB        -      4GB       $ 0.308       -                  no-luster-1
RTXA4000  1    Cudo   epyc-milan-rtx-a4000_4x1v2gb    16GB        -      4GB       $ 0.318       -                  se-smedjebacken-1
RTXA4000  1    Cudo   epyc-rome-rtx-a4000_8x1v4gb     16GB        -      8GB       $ 0.326       -                  no-luster-1
RTXA4000  1    Cudo   epyc-milan-rtx-a4000_8x1v4gb    16GB        -      8GB       $ 0.337       -                  se-smedjebacken-1
RTXA4000  1    Cudo   epyc-rome-rtx-a4000_24x1v12gb   16GB        -      24GB      $ 0.397       -                  no-luster-1
RTXA4000  1    Cudo   epyc-milan-rtx-a4000_24x1v12gb  16GB        -      24GB      $ 0.410       -                  se-smedjebacken-1
RTXA4000  1    Cudo   epyc-milan-rtx-a4000_48x1v24gb  16GB        -      48GB      $ 0.521       -                  se-smedjebacken-1
RTXA4000  2    Cudo   epyc-milan-rtx-a4000_16x2v8gb   16GB        -      16GB      $ 0.674       -                  se-smedjebacken-1
RTXA4000  2    Cudo   epyc-milan-rtx-a4000_32x2v16gb  16GB        -      32GB      $ 0.747       -                  se-smedjebacken-1
RTXA4000  2    Cudo   epyc-milan-rtx-a4000_48x2v24gb  16GB        -      48GB      $ 0.821       -                  se-smedjebacken-1
RTXA4000  2    Cudo   epyc-milan-rtx-a4000_96x2v48gb  16GB        -      96GB      $ 1.042       -                  se-smedjebacken-1

GPU       QTY  CLOUD  INSTANCE_TYPE             DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION
RTXA4500  1    Cudo   sky-lake-a4500_4x1v2gb    20GB        -      4GB       $ 0.478       -                  gb-london-1
RTXA4500  1    Cudo   sky-lake-a4500_8x1v4gb    20GB        -      8GB       $ 0.495       -                  gb-london-1
RTXA4500  1    Cudo   sky-lake-a4500_24x1v12gb  20GB        -      24GB      $ 0.566       -                  gb-london-1
RTXA4500  1    Cudo   sky-lake-a4500_48x1v24gb  20GB        -      48GB      $ 0.671       -                  gb-london-1
RTXA4500  2    Cudo   sky-lake-a4500_16x2v8gb   20GB        -      16GB      $ 0.990       -                  gb-london-1
RTXA4500  2    Cudo   sky-lake-a4500_32x2v16gb  20GB        -      32GB      $ 1.061       -                  gb-london-1
RTXA4500  2    Cudo   sky-lake-a4500_48x2v24gb  20GB        -      48GB      $ 1.131       -                  gb-london-1
RTXA4500  2    Cudo   sky-lake-a4500_96x2v48gb  20GB        -      96GB      $ 1.342       -                  gb-london-1

GPU       QTY  CLOUD  INSTANCE_TYPE                   DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION
RTXA5000  1    Cudo   epyc-milan-rtx-a5000_4x1v2gb    24GB        -      4GB       $ 0.568       -                  se-smedjebacken-1
RTXA5000  1    Cudo   epyc-milan-rtx-a5000_8x1v4gb    24GB        -      8GB       $ 0.587       -                  se-smedjebacken-1
RTXA5000  1    Cudo   epyc-milan-rtx-a5000_24x1v12gb  24GB        -      24GB      $ 0.660       -                  se-smedjebacken-1
RTXA5000  1    Cudo   epyc-milan-rtx-a5000_48x1v24gb  24GB        -      48GB      $ 0.771       -                  se-smedjebacken-1
RTXA5000  2    Cudo   epyc-milan-rtx-a5000_16x2v8gb   24GB        -      16GB      $ 1.174       -                  se-smedjebacken-1
RTXA5000  2    Cudo   epyc-milan-rtx-a5000_32x2v16gb  24GB        -      32GB      $ 1.247       -                  se-smedjebacken-1
RTXA5000  2    Cudo   epyc-milan-rtx-a5000_48x2v24gb  24GB        -      48GB      $ 1.321       -                  se-smedjebacken-1
RTXA5000  2    Cudo   epyc-milan-rtx-a5000_96x2v48gb  24GB        -      96GB      $ 1.542       -                  se-smedjebacken-1

GPU       QTY  CLOUD  INSTANCE_TYPE                  DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION
RTXA6000  1    Cudo   epyc-rome-rtx-a6000_4x1v2gb    48GB        -      4GB       $ 0.798       -                  no-luster-1
RTXA6000  1    Cudo   epyc-rome-rtx-a6000_8x1v4gb    48GB        -      8GB       $ 0.816       -                  no-luster-1
RTXA6000  1    Cudo   epyc-rome-rtx-a6000_24x1v12gb  48GB        -      24GB      $ 0.887       -                  no-luster-1
RTXA6000  1    Cudo   epyc-rome-rtx-a6000_48x1v24gb  48GB        -      48GB      $ 0.994       -                  no-luster-1
RTXA6000  2    Cudo   epyc-rome-rtx-a6000_16x2v8gb   48GB        -      16GB      $ 1.631       -                  no-luster-1
RTXA6000  2    Cudo   epyc-rome-rtx-a6000_32x2v16gb  48GB        -      32GB      $ 1.702       -                  no-luster-1
RTXA6000  2    Cudo   epyc-rome-rtx-a6000_48x2v24gb  48GB        -      48GB      $ 1.774       -                  no-luster-1
RTXA6000  2    Cudo   epyc-rome-rtx-a6000_96x2v48gb  48GB        -      96GB      $ 1.987       -                  no-luster-1

GPU   QTY  CLOUD  INSTANCE_TYPE                    DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION
V100  1    Cudo   intel-broadwell-v100_4x1v2gb     16GB        -      4GB       $ 1.008       -                  us-santaclara-1
V100  1    Cudo   intel-broadwell-v100_8x1v4gb     16GB        -      8GB       $ 1.027       -                  us-santaclara-1
V100  1    Cudo   intel-broadwell-v100_24x1v12gb   16GB        -      24GB      $ 1.100       -                  us-santaclara-1
V100  2    Cudo   intel-broadwell-v100_16x2v8gb    16GB        -      16GB      $ 2.054       -                  us-santaclara-1
V100  2    Cudo   intel-broadwell-v100_32x2v16gb   16GB        -      32GB      $ 2.127       -                  us-santaclara-1
V100  2    Cudo   intel-broadwell-v100_48x2v24gb   16GB        -      48GB      $ 2.201       -                  us-santaclara-1
V100  4    Cudo   intel-broadwell-v100_64x4v32gb   16GB        -      64GB      $ 4.254       -                  us-santaclara-1
V100  4    Cudo   intel-broadwell-v100_96x4v48gb   16GB        -      96GB      $ 4.402       -                  us-santaclara-1
V100  8    Cudo   intel-broadwell-v100_128x8v64gb  16GB        -      128GB     $ 8.509       -                  us-santaclara-1
V100  8    Cudo   intel-broadwell-v100_192x8v96gb  16GB        -      192GB     $ 8.803       -                  us-santaclara-1

@romilbhardwaj
Copy link
Collaborator

Thanks @wizenheimer. I think the desired behavior is to print only COMMON_GPU and OTHER_GPU, not the entire pricing table. I.e.:

sky show-gpus --cloud cudo
COMMON_GPU  AVAILABLE_QUANTITIES
V100        1, 2, 4, 8

OTHER_GPU  AVAILABLE_QUANTITIES
A40        1, 2, 4, 8
RTX3080    1
RTXA4000   1, 2
RTXA4500   1, 2
RTXA5000   1, 2
RTXA6000   1, 2

@wizenheimer
Copy link
Contributor Author

Hey @romilbhardwaj,
Quick follow up. Please have a look.

Current Change

$ sky show-gpus --cloud cudo
COMMON_GPU  AVAILABLE_QUANTITIES
V100        1, 2, 4, 8

OTHER_GPU  AVAILABLE_QUANTITIES
A40        1, 2, 4, 8
RTX3080    1
RTXA4000   1, 2
RTXA4500   1, 2
RTXA5000   1, 2
RTXA6000   1, 2
$ sky show-gpus --cloud runpod
COMMON_GPU  AVAILABLE_QUANTITIES
A100-80GB   1, 2, 4, 8
H100        1, 2, 4, 8
L4          1, 2, 4, 8

OTHER_GPU      AVAILABLE_QUANTITIES
A100-80GB-SXM  1, 2, 4, 8
A40            1, 2, 4, 8
H100-SXM       1, 2, 4, 8
L40            1, 2, 4, 8
RTX3090        1, 2, 4, 8
RTX4000-Ada    1, 2, 4, 8
RTX4090        1, 2, 4, 8
RTX6000-Ada    1, 2, 4, 8
RTXA4000       1, 2, 4, 8
RTXA4500       1, 2, 4, 8
RTXA5000       1, 2, 4, 8
RTXA6000       1, 2, 4, 8

Others (consistent with master)

$ sky show-gpus -a
COMMON_GPU  AVAILABLE_QUANTITIES
A10         1, 2, 4
A10G        1, 4, 8
A100        1, 2, 4, 8, 16
A100-80GB   1, 2, 4, 8
H100        1, 2, 4, 8, 12
K80         1, 2, 4, 8, 16
L4          1, 2, 4, 8
M60         1, 2, 4
P100        1, 2, 4
T4          1, 2, 4, 8
V100        1, 2, 4, 8
V100-32GB   1, 2, 4, 8

GOOGLE_TPU   AVAILABLE_QUANTITIES
tpu-v2-8     1
tpu-v2-32    1
tpu-v2-128   1
tpu-v2-256   1
tpu-v2-512   1
tpu-v3-8     1
tpu-v3-32    1
tpu-v3-64    1
tpu-v3-128   1
tpu-v3-256   1
tpu-v3-512   1
tpu-v3-1024  1
tpu-v3-2048  1
...
GPU   QTY  CLOUD   INSTANCE_TYPE        DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE
A100  1    Lambda  gpu_1x_a100          40GB        30     200GB     $ 1.290       -
A100  1    Lambda  gpu_1x_a100_sxm4     40GB        30     200GB     $ 1.290       -
A100  2    Lambda  gpu_2x_a100          40GB        60     400GB     $ 2.580       -
A100  4    Lambda  gpu_4x_a100          40GB        120    800GB     $ 5.160       -
A100  8    Lambda  gpu_8x_a100          40GB        124    1800GB    $ 10.320      -
A100  1    GCP     a2-highgpu-1g        -           12     85GB      $ 3.673       $ 1.469
A100  2    GCP     a2-highgpu-2g        -           24     170GB     $ 7.347       $ 2.939
A100  4    GCP     a2-highgpu-4g        -           48     340GB     $ 14.694      $ 5.877
A100  8    GCP     a2-highgpu-8g        -           96     680GB     $ 29.387      $ 11.755
A100  16   GCP     a2-megagpu-16g       -           96     1360GB    $ 55.740      $ 22.296
A100  8    OCI     BM.GPU4.8            40GB        128    2048GB    $ 24.400      -
A100  8    Azure   Standard_ND96asr_v4  -           96     900GB     $ 27.197      $ 2.992
A100  8    AWS     p4d.24xlarge         40GB        96     1152GB    $ 32.773      $ 11.158
...
GPU        QTY  CLOUD       INSTANCE_TYPE              DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE
A100-80GB  1    RunPod      1x_A100-80GB_SECURE        -           8      80GB      $ 1.990       -
A100-80GB  2    RunPod      2x_A100-80GB_SECURE        -           16     160GB     $ 3.980       -
A100-80GB  4    RunPod      4x_A100-80GB_SECURE        -           32     320GB     $ 7.960       -
A100-80GB  8    RunPod      8x_A100-80GB_SECURE        -           64     640GB     $ 15.920      -
A100-80GB  1    Paperspace  A100-80G                   -           12     80GB      $ 3.180       -
A100-80GB  8    Paperspace  A100-80Gx8                 -           96     640GB     $ 25.440      -
A100-80GB  2    Fluidstack  A100_PCIE_80GB::2          80GB        60     240GB     $ 3.500       -
A100-80GB  4    Fluidstack  A100_PCIE_80GB::4          80GB        124    480GB     $ 7.000       -
A100-80GB  8    Fluidstack  A100_PCIE_80GB::8          80GB        252    1440GB    $ 14.000      -
A100-80GB  1    Azure       Standard_NC24ads_A100_v4   -           24     220GB     $ 3.673       $ 0.404
A100-80GB  2    Azure       Standard_NC48ads_A100_v4   -           48     440GB     $ 7.346       $ 0.808
A100-80GB  4    Azure       Standard_NC96ads_A100_v4   -           96     880GB     $ 14.692      $ 1.616
A100-80GB  8    Azure       Standard_ND96amsr_A100_v4  -           96     1800GB    $ 32.770      $ 3.605

Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work @wizenheimer! Tried it out and left some comments.

sky/cli.py Outdated
Comment on lines 3196 to 3199
# Handle k8 messages if present
if k8s_messages:
yield '\n'
yield k8s_messages
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to keep this under the if block above, else k8s_messages are printed twice when sky show-gpus -a is run.

@@ -186,12 +186,12 @@ def get_cluster_info(
def get_command_runners(
provider_name: str,
cluster_info: common.ClusterInfo,
**crednetials: Dict[str, Any],
**credentials: Dict[str, Any],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this in #3924 :) perhaps this branch needs to rebased/merged with latest?

sky/cli.py Outdated
else:

# Handle hints and messages
if not show_all and cloud is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to do some special handling here. Refer to the master branch output and output in this branch below.

  1. We want to retain the hint at the end, but exclude the (including non-common ones) bit. I.e., we should print a hint like Hint: use -a/--all to see all accelerators and pricing. when sky show-gpus --cloud xyz is run, and the current hint Hint: use -a/--all to see all accelerators (including non-common ones) and pricing. when sky show-gpus is run.
  2. There's some extra empty lines at the end of this branch's output.

Master:

(base) ➜  ~ sky show-gpus --cloud aws
COMMON_GPU  AVAILABLE_QUANTITIES
A10G        1, 4, 8
A100        8
A100-80GB   8
H100        8
K80         1, 8, 16
L4          1, 4, 8
M60         1, 2, 4
T4          1, 4, 8
V100        1, 4, 8
V100-32GB   8

Hint: use -a/--all to see all accelerators (including non-common ones) and pricing.

This branch:

(base) ➜  ~ sky show-gpus --cloud aws
COMMON_GPU  AVAILABLE_QUANTITIES
A10G        1, 4, 8
A100        8
A100-80GB   8
H100        8
K80         1, 8, 16
L4          1, 4, 8
M60         1, 2, 4
T4          1, 4, 8
V100        1, 4, 8
V100-32GB   8

OTHER_GPU        AVAILABLE_QUANTITIES
Gaudi HL-205     8
L40S             1, 4, 8
Radeon Pro V520  1, 2, 4
T4g              1, 2


@wizenheimer
Copy link
Contributor Author

Thanks @romilbhardwaj,
Really appreciate the feedback. Updated the CLI's footnotes and handled duplicated K8 messages.
Here's a diff for reference:

  1. sky show-gpus -a - CLI Diff
  2. sky show-gpus --cloud aws - CLI Diff
  3. sky show-gpus - CLI Diff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants