Merge branch 'skypilot-org:master' into master

asaiacai · Oct 21, 2024 · 1dc531b · 1dc531b
2 parents 1df9749 + 3c3bcee
commit 1dc531b
Show file tree

Hide file tree

Showing 67 changed files with 1,919 additions and 791 deletions.
diff --git a/README.md b/README.md
@@ -38,30 +38,32 @@ This repository is a fork of the [original Skypilot](https://github.com/skypilot
 
 ----
 :fire: *News* :fire:
-- [Sep, 2024] Point, Launch and Serve **Llama 3.2** on Kubernetes or Any Cloud: [**example**](./llm/llama-3_2/)
-- [Sep, 2024] Run and deploy [**Pixtral**](./llm/pixtral), the first open-source multimodal model from Mistral AI.
-- [Jul, 2024] [**Finetune**](./llm/llama-3_1-finetuning/) and [**serve**](./llm/llama-3_1/) **Llama 3.1** on your infra
-- [Jun, 2024] Reproduce **GPT** with [llm.c](https://github.com/karpathy/llm.c/discussions/481) on any cloud: [**guide**](./llm/gpt-2/)
-- [Apr, 2024] Serve **Qwen-110B** on your infra: [**example**](./llm/qwen/)
-- [Apr, 2024] Using **Ollama** to deploy quantized LLMs on CPUs and GPUs: [**example**](./llm/ollama/)
-- [Feb, 2024] Deploying and scaling **Gemma** with SkyServe: [**example**](./llm/gemma/)
-- [Feb, 2024] Serving **Code Llama 70B** with vLLM and SkyServe: [**example**](./llm/codellama/)
-- [Dec, 2023] **Mixtral 8x7B**, a high quality sparse mixture-of-experts model, was released by Mistral AI! Deploy via SkyPilot on any cloud: [**example**](./llm/mixtral/)
-- [Nov, 2023] Using **Axolotl** to finetune Mistral 7B on the cloud (on-demand and spot): [**example**](./llm/axolotl/)
+- [Oct 2024] :tada: **SkyPilot crossed 1M+ downloads** :tada:: Thank you to our community! [**Twitter/X**](https://x.com/skypilot_org/status/1844770841718067638)
+- [Sep 2024] Point, Launch and Serve **Llama 3.2** on Kubernetes or Any Cloud: [**example**](./llm/llama-3_2/)
+- [Sep 2024] Run and deploy [**Pixtral**](./llm/pixtral), the first open-source multimodal model from Mistral AI.
+- [Jun 2024] Reproduce **GPT** with [llm.c](https://github.com/karpathy/llm.c/discussions/481) on any cloud: [**guide**](./llm/gpt-2/)
+- [Apr 2024] Serve [**Qwen-110B**](https://qwenlm.github.io/blog/qwen1.5-110b/) on your infra: [**example**](./llm/qwen/)
+- [Apr 2024] Using [**Ollama**](https://github.com/ollama/ollama) to deploy quantized LLMs on CPUs and GPUs: [**example**](./llm/ollama/)
+- [Feb 2024] Deploying and scaling [**Gemma**](https://blog.google/technology/developers/gemma-open-models/) with SkyServe: [**example**](./llm/gemma/)
+- [Feb 2024] Serving [**Code Llama 70B**](https://ai.meta.com/blog/code-llama-large-language-model-coding/) with vLLM and SkyServe: [**example**](./llm/codellama/)
+- [Dec 2023] [**Mixtral 8x7B**](https://mistral.ai/news/mixtral-of-experts/), a high quality sparse mixture-of-experts model, was released by Mistral AI! Deploy via SkyPilot on any cloud: [**example**](./llm/mixtral/)
+- [Nov 2023] Using [**Axolotl**](https://github.com/OpenAccess-AI-Collective/axolotl) to finetune Mistral 7B on the cloud (on-demand and spot): [**example**](./llm/axolotl/)
+
+**LLM Finetuning Cookbooks**: Finetuning Llama 2 / Llama 3.1 in your own cloud environment, privately: Llama 2 [**example**](./llm/vicuna-llama-2/) and [**blog**](https://blog.skypilot.co/finetuning-llama2-operational-guide/); Llama 3.1 [**example**](./llm/llama-3_1-finetuning/) and [**blog**](https://blog.skypilot.co/finetune-llama-3_1-on-your-infra/)
 
 <details>
   <summary>Archived</summary>
 
-- [Apr, 2024] Serve and finetune [**Llama 3**](https://skypilot.readthedocs.io/en/latest/gallery/llms/llama-3.html) on any cloud or Kubernetes: [**example**](./llm/llama-3/)
-- [Mar, 2024] Serve and deploy [**Databricks DBRX**](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm) on your infra: [**example**](./llm/dbrx/)
-- [Feb, 2024] Speed up your LLM deployments with [**SGLang**](https://github.com/sgl-project/sglang) for 5x throughput on SkyServe: [**example**](./llm/sglang/)
-- [Dec, 2023] Using [**LoRAX**](https://github.com/predibase/lorax) to serve 1000s of finetuned LLMs on a single instance in the cloud: [**example**](./llm/lorax/)
-- [Sep, 2023] [**Mistral 7B**](https://mistral.ai/news/announcing-mistral-7b/), a high-quality open LLM, was released! Deploy via SkyPilot on any cloud: [**Mistral docs**](https://docs.mistral.ai/self-deployment/skypilot)
-- [Sep, 2023] Case study: [**Covariant**](https://covariant.ai/) transformed AI development on the cloud using SkyPilot, delivering models 4x faster cost-effectively: [**read the case study**](https://blog.skypilot.co/covariant/)
-- [Aug, 2023] **Finetuning Cookbook**: Finetuning Llama 2 in your own cloud environment, privately: [**example**](./llm/vicuna-llama-2/), [**blog post**](https://blog.skypilot.co/finetuning-llama2-operational-guide/)
-- [July, 2023] Self-Hosted **Llama-2 Chatbot** on Any Cloud: [**example**](./llm/llama-2/)
-- [June, 2023] Serving LLM 24x Faster On the Cloud [**with vLLM**](https://vllm.ai/) and SkyPilot: [**example**](./llm/vllm/), [**blog post**](https://blog.skypilot.co/serving-llm-24x-faster-on-the-cloud-with-vllm-and-skypilot/)
-- [April, 2023] [SkyPilot YAMLs](./llm/vicuna/) for finetuning & serving the [Vicuna LLM](https://lmsys.org/blog/2023-03-30-vicuna/) with a single command!
+- [Jul 2024] [**Finetune**](./llm/llama-3_1-finetuning/) and [**serve**](./llm/llama-3_1/) **Llama 3.1** on your infra
+- [Apr 2024] Serve and finetune [**Llama 3**](https://skypilot.readthedocs.io/en/latest/gallery/llms/llama-3.html) on any cloud or Kubernetes: [**example**](./llm/llama-3/)
+- [Mar 2024] Serve and deploy [**Databricks DBRX**](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm) on your infra: [**example**](./llm/dbrx/)
+- [Feb 2024] Speed up your LLM deployments with [**SGLang**](https://github.com/sgl-project/sglang) for 5x throughput on SkyServe: [**example**](./llm/sglang/)
+- [Dec 2023] Using [**LoRAX**](https://github.com/predibase/lorax) to serve 1000s of finetuned LLMs on a single instance in the cloud: [**example**](./llm/lorax/)
+- [Sep 2023] [**Mistral 7B**](https://mistral.ai/news/announcing-mistral-7b/), a high-quality open LLM, was released! Deploy via SkyPilot on any cloud: [**Mistral docs**](https://docs.mistral.ai/self-deployment/skypilot)
+- [Sep 2023] Case study: [**Covariant**](https://covariant.ai/) transformed AI development on the cloud using SkyPilot, delivering models 4x faster cost-effectively: [**read the case study**](https://blog.skypilot.co/covariant/)
+- [Jul 2023] Self-Hosted **Llama-2 Chatbot** on Any Cloud: [**example**](./llm/llama-2/)
+- [Jun 2023] Serving LLM 24x Faster On the Cloud [**with vLLM**](https://vllm.ai/) and SkyPilot: [**example**](./llm/vllm/), [**blog post**](https://blog.skypilot.co/serving-llm-24x-faster-on-the-cloud-with-vllm-and-skypilot/)
+- [Apr 2023] [SkyPilot YAMLs](./llm/vicuna/) for finetuning & serving the [Vicuna LLM](https://lmsys.org/blog/2023-03-30-vicuna/) with a single command!
 
 </details>
 

diff --git a/docs/source/examples/syncing-code-artifacts.rst b/docs/source/examples/syncing-code-artifacts.rst
@@ -46,31 +46,7 @@ VMs.  The task is invoked under that working directory (so that it can call
 scripts, access checkpoints, etc.).
 
 .. note::
-
-    **Exclude files from syncing**
-
-    For large, multi-gigabyte workdirs, uploading may be slow because they
-    are synced to the remote VM(s). To exclude large files in
-    your workdir from being uploaded, add them to a :code:`.skyignore` file 
-    under your workdir. :code:`.skyignore` follows RSYNC filter rules. 
-
-    Example :code:`.skyignore` file:
-
-    .. code-block::
-        
-      # Files that match pattern under ONLY CURRENT directory
-      /hello.py
-      /*.txt
-      /dir
-
-      # Files that match pattern under ALL directories
-      *.txt
-      hello.py
-
-      # Files that match pattern under a directory ./dir/
-      /dir/*.txt
-    
-    Do NOT use ``.`` to indicate local directory (e.g. ``./hello.py``).
+  To exclude large files from being uploaded, see :ref:`exclude-uploading-files`.
 
 .. note::
 
@@ -140,6 +116,33 @@ file_mount may be slow because they are processed by ``rsync``.  Use
 :ref:`SkyPilot bucket mounting <sky-storage>` to efficiently handle
 large files.
 
+.. _exclude-uploading-files:
+
+Exclude uploading files
+--------------------------------------
+By default, SkyPilot uses your existing :code:`.gitignore` and :code:`.git/info/exclude` to exclude files from syncing.
+
+Alternatively, you can use :code:`.skyignore` if you want to separate SkyPilot's syncing behavior from Git's.
+If you use a :code:`.skyignore` file, SkyPilot will only exclude files based on that file without using the default Git files.
+
+Any :code:`.skyignore` file under either your workdir or source paths of file_mounts is respected.
+
+:code:`.skyignore` follows RSYNC filter rules, e.g.
+
+.. code-block::
+
+  # Files that match pattern under CURRENT directory
+  /file.txt
+  /dir
+  /*.jar
+  /dir/*.jar
+
+  # Files that match pattern under ALL directories
+  *.jar
+  file.txt
+
+Do _not_ use ``.`` to indicate local directory (e.g., instead of ``./file``, write ``/file``).
+
 .. _downloading-files-and-artifacts:
 
 Downloading files and artifacts

diff --git a/docs/source/reference/config.rst b/docs/source/reference/config.rst
@@ -419,6 +419,15 @@ Available fields and semantics:
     # Default: 'LOCAL_CREDENTIALS'.
     remote_identity: LOCAL_CREDENTIALS
 
+    # Enable gVNIC (optional).
+    #
+    # Set to true to use gVNIC on GCP instances. gVNIC offers higher performance
+    # for multi-node clusters, but costs more.
+    # Reference: https://cloud.google.com/compute/docs/networking/using-gvnic
+    #
+    # Default: false.
+    enable_gvnic: false
+
   # Advanced Azure configurations (optional).
   # Apply to all new instances but not existing ones.
   azure:

diff --git a/docs/source/reference/kubernetes/kubernetes-deployment.rst b/docs/source/reference/kubernetes/kubernetes-deployment.rst
@@ -114,9 +114,9 @@ Deploying on Google Cloud GKE
      # Example:
      # gcloud container clusters get-credentials testcluster --region us-central1-c
 
-3. [If using GPUs] If your GKE nodes have GPUs, you may need to to
-   `manually install <https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/>`_
-   nvidia drivers. You can do so by deploying the daemonset
+3. [If using GPUs] For GKE versions newer than 1.30.1-gke.115600, NVIDIA drivers are pre-installed and no additional setup is required. If you are using an older GKE version, you may need to
+   `manually install <https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#installing_drivers>`_
+   NVIDIA drivers for GPU support. You can do so by deploying the daemonset
    depending on the GPU and OS on your nodes:
 
    .. code-block:: console
@@ -133,7 +133,8 @@ Deploying on Google Cloud GKE
      # For Ubuntu based nodes with L4 GPUs:
      $ kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/ubuntu/daemonset-preloaded-R525.yaml
 
-   To verify if GPU drivers are set up, run ``kubectl describe nodes`` and verify that ``nvidia.com/gpu`` is listed under the ``Capacity`` section.
+   .. tip::
+      To verify if GPU drivers are set up, run ``kubectl describe nodes`` and verify that ``nvidia.com/gpu`` resource is listed under the ``Capacity`` section.
 
 4. Verify your kubernetes cluster is correctly set up for SkyPilot by running :code:`sky check`:
 

diff --git a/docs/source/reference/kubernetes/kubernetes-getting-started.rst b/docs/source/reference/kubernetes/kubernetes-getting-started.rst
@@ -119,6 +119,57 @@ Once your cluster administrator has :ref:`setup a Kubernetes cluster <kubernetes
     $ kubectl config set-context --current --namespace=mynamespace
 
 
+
+Viewing cluster status
+----------------------
+
+To view the status of all SkyPilot resources in the Kubernetes cluster, run :code:`sky status --k8s`.
+
+Unlike :code:`sky status` which lists only the SkyPilot resources launched by the current user,
+:code:`sky status --k8s` lists all SkyPilot resources in the Kubernetes cluster across all users.
+
+.. code-block:: console
+
+    $ sky status --k8s
+    Kubernetes cluster state (context: mycluster)
+    SkyPilot clusters
+    USER     NAME                           LAUNCHED    RESOURCES                                  STATUS
+    alice    infer-svc-1                    23 hrs ago  1x Kubernetes(cpus=1, mem=1, {'L4': 1})    UP
+    alice    sky-jobs-controller-80b50983   2 days ago  1x Kubernetes(cpus=4, mem=4)               UP
+    alice    sky-serve-controller-80b50983  23 hrs ago  1x Kubernetes(cpus=4, mem=4)               UP
+    bob      dev                            1 day ago   1x Kubernetes(cpus=2, mem=8, {'H100': 1})  UP
+    bob      multinode-dev                  1 day ago   2x Kubernetes(cpus=2, mem=2)               UP
+    bob      sky-jobs-controller-2ea485ea   2 days ago  1x Kubernetes(cpus=4, mem=4)               UP
+
+    Managed jobs
+    In progress tasks: 1 STARTING
+    USER     ID  TASK  NAME      RESOURCES   SUBMITTED   TOT. DURATION  JOB DURATION  #RECOVERIES  STATUS
+    alice    1   -     eval      1x[CPU:1+]  2 days ago  49s            8s            0            SUCCEEDED
+    bob      4   -     pretrain  1x[H100:4]  1 day ago   1h 1m 11s      1h 14s        0            SUCCEEDED
+    bob      3   -     bigjob    1x[CPU:16]  1 day ago   1d 21h 11m 4s  -             0            STARTING
+    bob      2   -     failjob   1x[CPU:1+]  1 day ago   54s            9s            0            FAILED
+    bob      1   -     shortjob  1x[CPU:1+]  2 days ago  1h 1m 19s      1h 16s        0            SUCCEEDED
+
+You can also inspect the real-time GPU usage on the cluster with :code:`sky show-gpus --cloud kubernetes`.
+
+.. code-block:: console
+
+    $ sky show-gpus --cloud kubernetes
+    Kubernetes GPUs
+    GPU   QTY_PER_NODE  TOTAL_GPUS  TOTAL_FREE_GPUS
+    L4    1, 2, 4       12          12
+    H100  1, 2, 4, 8    16          16
+
+    Kubernetes per node GPU availability
+    NODE_NAME                  GPU_NAME  TOTAL_GPUS  FREE_GPUS
+    my-cluster-0               L4        4           4
+    my-cluster-1               L4        4           4
+    my-cluster-2               L4        2           2
+    my-cluster-3               L4        2           2
+    my-cluster-4               H100      8           8
+    my-cluster-5               H100      8           8
+
+
 .. _kubernetes-custom-images:
 
 Using Custom Images

diff --git a/docs/source/reference/kubernetes/kubernetes-ports.rst b/docs/source/reference/kubernetes/kubernetes-ports.rst
@@ -59,40 +59,18 @@ To restrict your services to be accessible only within the cluster, you can set
 
 Depending on your cloud, set the appropriate annotation in the SkyPilot config file (``~/.sky/config.yaml``):
 
-.. tab-set::
-
-    .. tab-item:: GCP
-        :sync: internal-lb-gke
-
-        .. code-block:: yaml
-
-          # ~/.sky/config.yaml
-          kubernetes:
-            custom_metadata:
-                annotations:
-                   networking.gke.io/load-balancer-type: "Internal"
-
-    .. tab-item:: AWS
-        :sync: internal-lb-aws
-
-        .. code-block:: yaml
-
-          # ~/.sky/config.yaml
-          kubernetes:
-            custom_metadata:
-                annotations:
-                  service.beta.kubernetes.io/aws-load-balancer-internal: "true"
-
-    .. tab-item:: Azure
-        :sync: internal-lb-azure
-
-        .. code-block:: yaml
+.. code-block:: yaml
 
-          # ~/.sky/config.yaml
-          kubernetes:
-            custom_metadata:
-                annotations:
-                  service.beta.kubernetes.io/azure-load-balancer-internal: "true"
+    # ~/.sky/config.yaml
+    kubernetes:
+      custom_metadata:
+        annotations:
+          # For GCP/GKE
+          networking.gke.io/load-balancer-type: "Internal"
+          # For AWS/EKS
+          service.beta.kubernetes.io/aws-load-balancer-internal: "true"
+          # For Azure/AKS
+          service.beta.kubernetes.io/azure-load-balancer-internal: "true"
 
 
 .. _kubernetes-ingress:

diff --git a/docs/source/reference/kubernetes/kubernetes-setup.rst b/docs/source/reference/kubernetes/kubernetes-setup.rst
@@ -261,9 +261,19 @@ You can also check the GPUs available on your nodes by running:
 .. code-block:: console
 
     $ sky show-gpus --cloud kubernetes
+    Kubernetes GPUs
     GPU   QTY_PER_NODE  TOTAL_GPUS  TOTAL_FREE_GPUS
-    L4    1, 2, 3, 4    8           6
-    H100  1, 2          4           2
+    L4    1, 2, 4       12          12
+    H100  1, 2, 4, 8    16          16
+
+    Kubernetes per node GPU availability
+    NODE_NAME                  GPU_NAME  TOTAL_GPUS  FREE_GPUS
+    my-cluster-0               L4        4           4
+    my-cluster-1               L4        4           4
+    my-cluster-2               L4        2           2
+    my-cluster-3               L4        2           2
+    my-cluster-4               H100      8           8
+    my-cluster-5               H100      8           8
 
 
 .. _kubernetes-observability:
@@ -274,8 +284,47 @@ All SkyPilot tasks are run in pods inside a Kubernetes cluster. As a cluster adm
 you can inspect running pods (e.g., with :code:`kubectl get pods -n namespace`) to check which
 tasks are running and how many resources they are consuming on the cluster.
 
-Additionally, you can also deploy tools such as the `Kubernetes dashboard <https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/>`_ for easily viewing and managing
-SkyPilot tasks running on your cluster.
+Below, we provide tips on how to monitor SkyPilot resources on your Kubernetes cluster.
+
+.. _kubernetes-observability-skystatus:
+
+List SkyPilot resources across all users
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+We provide a convenience command, :code:`sky status --k8s`, to view the status of all SkyPilot resources in the cluster.
+
+Unlike :code:`sky status` which lists only the SkyPilot resources launched by the current user,
+:code:`sky status --k8s` lists all SkyPilot resources in the cluster across all users.
+
+.. code-block:: console
+
+    $ sky status --k8s
+    Kubernetes cluster state (context: mycluster)
+    SkyPilot clusters
+    USER     NAME                           LAUNCHED    RESOURCES                                  STATUS
+    alice    infer-svc-1                    23 hrs ago  1x Kubernetes(cpus=1, mem=1, {'L4': 1})    UP
+    alice    sky-jobs-controller-80b50983   2 days ago  1x Kubernetes(cpus=4, mem=4)               UP
+    alice    sky-serve-controller-80b50983  23 hrs ago  1x Kubernetes(cpus=4, mem=4)               UP
+    bob      dev                            1 day ago   1x Kubernetes(cpus=2, mem=8, {'H100': 1})  UP
+    bob      multinode-dev                  1 day ago   2x Kubernetes(cpus=2, mem=2)               UP
+    bob      sky-jobs-controller-2ea485ea   2 days ago  1x Kubernetes(cpus=4, mem=4)               UP
+
+    Managed jobs
+    In progress tasks: 1 STARTING
+    USER     ID  TASK  NAME      RESOURCES   SUBMITTED   TOT. DURATION  JOB DURATION  #RECOVERIES  STATUS
+    alice    1   -     eval      1x[CPU:1+]  2 days ago  49s            8s            0            SUCCEEDED
+    bob      4   -     pretrain  1x[H100:4]  1 day ago   1h 1m 11s      1h 14s        0            SUCCEEDED
+    bob      3   -     bigjob    1x[CPU:16]  1 day ago   1d 21h 11m 4s  -             0            STARTING
+    bob      2   -     failjob   1x[CPU:1+]  1 day ago   54s            9s            0            FAILED
+    bob      1   -     shortjob  1x[CPU:1+]  2 days ago  1h 1m 19s      1h 16s        0            SUCCEEDED
+
+
+.. _kubernetes-observability-dashboard:
+
+Kubernetes Dashboard
+^^^^^^^^^^^^^^^^^^^^
+You can deploy tools such as the `Kubernetes dashboard <https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/>`_ to easily view and manage
+SkyPilot resources on your cluster.
 
 .. image:: ../../images/screenshots/kubernetes/kubernetes-dashboard.png
     :width: 80%

diff --git a/docs/source/reference/yaml-spec.rst b/docs/source/reference/yaml-spec.rst
@@ -22,8 +22,8 @@ Available fields:
     # If a relative path is used, it's evaluated relative to the location from 
     # which `sky` is called.
     #
-    # To exclude files from syncing, add them to a .skyignore file under your working directory.
-    # Details: https://skypilot.readthedocs.io/en/latest/examples/syncing-code-artifacts.html#uploading-code-and-project-files
+    # To exclude files from syncing, see 
+    # https://skypilot.readthedocs.io/en/latest/examples/syncing-code-artifacts.html#exclude-uploading-files
     workdir: ~/my-task-code
 
     # Number of nodes (optional; defaults to 1) to launch including the head node.