diff --git a/docs/en/1-Experiments/Jupyter.md b/docs/en/1-Experiments/Jupyter.md index 3c698d181..3d49db6cf 100644 --- a/docs/en/1-Experiments/Jupyter.md +++ b/docs/en/1-Experiments/Jupyter.md @@ -20,7 +20,7 @@ Jupyter comes with a number of features (and we can add more) [![Explore your data](../images/ExploreData.PNG)](../../2-Publishing/Datasette/) -Use **[Datasette](~../2-Publishing/Datasette.md/)** , an instant JSON API for your SQLite databases. Run SQL queries in a more interactive way! +Use **[Datasette](../../2-Publishing/Datasette.md/)** , an instant JSON API for your SQLite databases. Run SQL queries in a more interactive way! ### IDE in the browser @@ -78,9 +78,9 @@ You can upload and download data to/from JupyterHub directly in the menu. There ### Shareable "Bucket" storage -There is also a mounted `buckets` folder in your home directory, which holds files in [MinIO](../Storage.md/#buckets-via-minio). +There is also a mounted `buckets` folder in your home directory, which holds files in [Azure Blob Storage](../../5-Storage/AzureBlobStorage). -**Refer to the [Storage](../index.md#storage) section for details.** +**Refer to the [Storage](../../5-Storage/Overview) section for details.** ## Data Analysis diff --git a/docs/en/1-Experiments/Remote-Desktop.md b/docs/en/1-Experiments/Remote-Desktop.md index 1ba8e48be..52ff57808 100644 --- a/docs/en/1-Experiments/Remote-Desktop.md +++ b/docs/en/1-Experiments/Remote-Desktop.md @@ -14,7 +14,7 @@ The Ubuntu Virtual Desktop is a powerful tool for data scientists and machine le Remote Desktop provides an in-browser GUI Ubuntu desktop experience as well as quick access to supporting tools. The operating system is -[**Ubuntu**](https://ubuntu.com/about) **18.04** with the +[**Ubuntu**](https://ubuntu.com/about) **22.04** with the [**XFCE**](https://www.xfce.org/about) desktop environment. ![Remote Desktop](../images/rd_desktop.png) @@ -32,7 +32,7 @@ _pip_, _conda_, _npm_ and _yarn_ are available to install various packages. ## Accessing the Remote Desktop To launch the Remote Desktop or any of its supporting tools, create a Notebook -Server in [Kubeflow](./Kubeflow.md) and select the remote desktop option. +Server in [Kubeflow](./Kubeflow.md) and select the remote desktop option, which is the Ubuntu image. ![Remote Desktop](../images/RemoteDesktop.PNG) diff --git a/docs/en/1-Experiments/Selecting-an-Image.md b/docs/en/1-Experiments/Selecting-an-Image.md index adfb22863..8157806ab 100644 --- a/docs/en/1-Experiments/Selecting-an-Image.md +++ b/docs/en/1-Experiments/Selecting-an-Image.md @@ -8,7 +8,7 @@ When selecting an image, you have 3 main options: - Jupyter Notebook (CPU, TensorFlow, PyTorch) - RStudio -- Remote Desktop (r, geomatics) +- Remote Desktop ## Jupyter Notebooks @@ -48,7 +48,7 @@ experience. ## RStudio -**[RStudio](RStudio/)** gives you an integrated development environment +**[RStudio](../RStudio/)** gives you an integrated development environment specifically for `R`. If you're coding in `R`, this is typically the Notebook Server to use. Use the `rstudio` image to get an RStudio environment. @@ -59,7 +59,7 @@ Server to use. Use the `rstudio` image to get an RStudio environment. For a full Ubuntu desktop experience, use the remote desktop image. It comes pre-loaded with Python, R and Geomatics tooling, but are delivered in a typical desktop experience that also comes with Firefox, VS Code, and open office tools. -The operating system is **[Ubuntu](https://ubuntu.com/about)** 18.04 with the +The operating system is **[Ubuntu](https://ubuntu.com/about)** 22.04 with the **[XFCE](https://www.xfce.org/about)** desktop environment. ![Screenshot of the Virtual Desktop](../images/rd_desktop.png) diff --git a/docs/en/2-Publishing/Accessing-Published-Content.md b/docs/en/2-Publishing/Accessing-Published-Content.md deleted file mode 100644 index e69de29bb..000000000 diff --git a/docs/en/2-Publishing/Custom.md b/docs/en/2-Publishing/Custom.md index 326bee7d4..660f932ad 100644 --- a/docs/en/2-Publishing/Custom.md +++ b/docs/en/2-Publishing/Custom.md @@ -9,9 +9,7 @@ container. For instance, Node.js apps, Flask or Dash apps. Etc. !!! info "See the source code for this app" - We just push these kinds of applications through GitHub into the server. The - source for the above app is - [`StatCan/covid19`](https://github.com/StatCan/covid19) + We just push these kinds of applications through GitHub into the server. # Setup diff --git a/docs/en/2-Publishing/Dash.md b/docs/en/2-Publishing/Dash.md index 882672741..d7bf57c46 100644 --- a/docs/en/2-Publishing/Dash.md +++ b/docs/en/2-Publishing/Dash.md @@ -31,7 +31,7 @@ This is an example of a Layout With Figure and Slider from _Publish with Canadian-made software._ -**[Plotly Dash](/2-Publishing/Dash/)** is a popular Python library that allows you to create interactive web-based visualizations and dashboards with ease. Developed by the Montreal-based company Plotly, Dash has gained a reputation for being a powerful and flexible tool for building custom data science graphics. With Dash, you can create everything from simple line charts to complex, multi-page dashboards with interactive widgets and controls. Because it's built on open source technologies like Flask, React, and Plotly.js, Dash is highly customizable and can be easily integrated with other data science tools and workflows. Whether you're a data scientist, analyst, or developer, Dash can help you create engaging and informative visualizations that bring your data to life. +**[Plotly Dash](../Dash/)** is a popular Python library that allows you to create interactive web-based visualizations and dashboards with ease. Developed by the Montreal-based company Plotly, Dash has gained a reputation for being a powerful and flexible tool for building custom data science graphics. With Dash, you can create everything from simple line charts to complex, multi-page dashboards with interactive widgets and controls. Because it's built on open source technologies like Flask, React, and Plotly.js, Dash is highly customizable and can be easily integrated with other data science tools and workflows. Whether you're a data scientist, analyst, or developer, Dash can help you create engaging and informative visualizations that bring your data to life. # Getting Started diff --git a/docs/en/2-Publishing/Datasette.md b/docs/en/2-Publishing/Datasette.md index 9fcdd54e9..dd62e4d7b 100644 --- a/docs/en/2-Publishing/Datasette.md +++ b/docs/en/2-Publishing/Datasette.md @@ -26,9 +26,10 @@ You can even explore maps within the tool! ![Run SQL Queries](../images/datasette-sql.png) + # Getting Started diff --git a/docs/en/2-Publishing/PowerBI.md b/docs/en/2-Publishing/PowerBI.md index 443d131f2..f805b89ff 100644 --- a/docs/en/2-Publishing/PowerBI.md +++ b/docs/en/2-Publishing/PowerBI.md @@ -13,7 +13,7 @@ our Storage system, and use the data as a `pandas` data frame. 1. A computer with Power BI, and Python 3.6 2. Your MinIO `ACCESS_KEY` and `SECRET_KEY` on hand. (See - [Storage](../index.md#storage)) + [Storage](../../5-Storage/Overview)) ## Set up Power BI diff --git a/docs/en/2-Publishing/R-Shiny.md b/docs/en/2-Publishing/R-Shiny.md index f03960d42..126e41f18 100644 --- a/docs/en/2-Publishing/R-Shiny.md +++ b/docs/en/2-Publishing/R-Shiny.md @@ -12,7 +12,7 @@ R-Shiny is an R package that makes it easy to build interactive web apps in R. _Publish Professional Quality Graphics_ -[![InteractiveDashboard](../images/InteractiveDashboard.PNG)](/2-Publishing/R-Shiny/) +[![InteractiveDashboard](../images/InteractiveDashboard.PNG)](../R-Shiny/) R Shiny is an open source web application framework that allows data scientists and analysts to create interactive, web-based dashboards and data visualizations using the R programming language. One of the main advantages of R Shiny is that it offers a straightforward way to create high-quality, interactive dashboards without the need for extensive web development expertise. With R Shiny, data scientists can leverage their R coding skills to create dynamic, data-driven web applications that can be shared easily with stakeholders. @@ -109,7 +109,7 @@ If you need extra R libraries to be installed, send your list to [the R-Shiny re !!! example "See the above dashboard here" - The above dashboard is in GitHub. Take a look at [the source](https://github.com/StatCan/R-dashboards/tree/master/bus-dashboard), and [see the dashboard live](https://shiny.covid.cloud.statcan.ca/bus-dashboard). + The above dashboard is in GitHub. Take a look at [the source](https://github.com/StatCan/R-dashboards/tree/master/bus-dashboard)). ## Once you've got the basics ... diff --git a/docs/en/3-Pipelines/Argo.md b/docs/en/3-Pipelines/Argo.md index 0f67e3d41..06d92e60a 100644 --- a/docs/en/3-Pipelines/Argo.md +++ b/docs/en/3-Pipelines/Argo.md @@ -374,14 +374,6 @@ Couler provides a simple, unified application programming interface for defining # Run the workflow w.create() ``` -=== "YAML" - ``` py title="workflow.yaml" linenums="1" - - ``` -=== "Seldon?" - ``` py - - ``` ### Additional Resources for Argo Workflows diff --git a/docs/en/4-Collaboration/Overview.md b/docs/en/4-Collaboration/Overview.md index 329cc0d5b..bf3f628e5 100644 --- a/docs/en/4-Collaboration/Overview.md +++ b/docs/en/4-Collaboration/Overview.md @@ -10,7 +10,7 @@ There are many ways collaborate on the AAW. Which is best for your situation dep | **Data** | Personal folder or bucket | Team folder or bucket, or shared namespace | Shared Bucket | | **Compute** | Personal namespace | Shared namespace | N/A | -Sharing code, disks, and workspaces (e.g.: two people sharing the same virtual machine) is described in more detail below. Sharing data through buckets is described in more detail in the **[MinIO](../5-Storage/AzureBlobStorage.md)** section. +Sharing code, disks, and workspaces (e.g.: two people sharing the same virtual machine) is described in more detail below. Sharing data through buckets is described in more detail in the **[Azure Blob Storage](../5-Storage/AzureBlobStorage.md)** section. ??? question "What is the difference between a bucket and a folder?" @@ -37,7 +37,7 @@ If you need to share code without publishing it on a repository, !!! danger "Sharing a namespace means you share **everything** in the namespace" - Kubeflow does not support granular sharing of one resource (one notebook, one MinIO bucket, etc.), but instead sharing of **all** resources. If you want to share a Jupyter Notebook server with someone, you must share your entire namespace and **they will have access to all other resources (MinIO buckets, etc.)**. + Kubeflow does not support granular sharing of one resource (one notebook, one volume, etc.), but instead sharing of **all** resources. If you want to share a Jupyter Notebook server with someone, you must share your entire namespace and **they will have access to all other resources (Azure Blob Storage, etc.)**. In Kubeflow every user has a **namespace** that contains their work (their notebook servers, pipelines, disks, etc.). Your namespace belongs to you, but @@ -47,7 +47,7 @@ share with a team). One option for collaboration is to share namespaces with others. The advantage of sharing a Kubeflow namespace is that it lets you and your -colleagues share the compute environment and MinIO buckets associated with the +colleagues share the compute environment and volumes associated with the namespace. This makes it a very easy and free-form way to share. To share your namespace, see [managing contributors](#managing-contributors) @@ -68,17 +68,6 @@ Once you have a shared namespace, you have two shared storage approaches To learn more about the technology behind these, check out the [Storage overview](../5-Storage/Overview.md). -### Sharing with StatCan - -In addition to private buckets, or team-shared private buckets, you can also -place your files in _shared storage_. Within all bucket storage options -(`minimal`, `premium`, `pachyderm`), you have a private bucket, **and** a folder -inside of the `shared` bucket. Take a look, for instance, at the link below: - -- [`shared/blair-drummond/`](https://minimal-tenant1-minio.covid.cloud.statcan.ca/minio/shared/blair-drummond/) - -Any **logged in** user can see these files and read them freely. - ### Sharing with the world Ask about that one in our [Slack channel](https://statcan-aaw.slack.com). There @@ -86,12 +75,6 @@ are many ways to do this from the IT side, but it's important for it to go through proper processes, so this is not done in a "self-serve" way that the others are. That said, it is totally possible. -## Recommendation: Combine them all - -It's a great idea to always use git, and using git along with shared workspaces -is a great way to combine ad hoc sharing (through files) while also keeping your -code organized and tracked. - ## Managing contributors You can add or remove people from a namespace you already own through the diff --git a/docs/en/5-Storage/Disks.md b/docs/en/5-Storage/Disks.md index ef0e76bb3..e8a061031 100644 --- a/docs/en/5-Storage/Disks.md +++ b/docs/en/5-Storage/Disks.md @@ -6,7 +6,7 @@ you from fast solid state drives (SSDs)! # Setup When creating your notebook server, you request disks by adding Data Volumes to -your notebook server (pictured below, with `Type = New`). They are automatically +your notebook server (pictured below, with go to `Advanced Options`). They are automatically mounted at the directory (`Mount Point`) you choose, and serve as a simple and reliable way to preserve data attached to a Notebook Server. @@ -27,7 +27,7 @@ to reuse). If you're done with the disk and it's contents, ## Deleting Disk Storage To see your disks, check the Notebook Volumes section of the Notebook Server -page (shown below). You can delete any unattached disk (orange icon on the left) +page (shown below). You can delete any unattached disk (icon on the left) by clicking the trash can icon. ![Delete an unattached volume from the Notebook Server screen](../images/kubeflow_delete_disk.png) diff --git a/docs/en/7-MLOps/Machine-Learning-Model-Cloud-Storage.md b/docs/en/7-MLOps/Machine-Learning-Model-Cloud-Storage.md index 13c995096..0f5e821f6 100644 --- a/docs/en/7-MLOps/Machine-Learning-Model-Cloud-Storage.md +++ b/docs/en/7-MLOps/Machine-Learning-Model-Cloud-Storage.md @@ -29,25 +29,16 @@ Overall, cloud storage is a reliable and convenient solution for storing and man The AAW platform provides several types of storage: - Disks (also called Volumes on the Kubeflow Notebook Server creation screen) -- Buckets ("Blob" or S3 storage, provided through MinIO) - Data Lakes (coming soon) Depending on your use case, either disk or bucket may be most suitable. Our [storage overview](../5-Storage/Overview.md) will help you compare them. ### Disks -[![Disks](../images/Disks.PNG)](Storage.md/) +[![Disks](../images/Disks.PNG)](../5-Storage/Disks.md) **[Disks](../5-Storage/Disks.md)** are added to your notebook server by adding Data Volumes. -### Buckets - -MinIO is an S3-API compatible object storage system that provides an open source alternative to proprietary cloud storage services. While we currently use MinIO as our cloud storage solution, we plan on replacing it with s3-proxy in the near future. S3-proxy is a lightweight, open source reverse proxy server that allows you to access Amazon S3-compatible storage services with your existing applications. By switching to s3-proxy, we will be able to improve our cloud storage performance, security, and scalability, while maintaining compatibility with the S3 API. - -[![MinIO](../images/Buckets.PNG)](AzureBlobStorage.md/) - -**[MinIO](../5-Storage/AzureBlobStorage.md)** is a cloud-native scalable object store. We use it for buckets (blob or S3 storage). - ### Data Lakes (Coming Soon) A data lake is a central repository that allows you to store all your structured and unstructured data at any scale. It's a cost-effective way to store and manage all types of data, from raw data to processed data, and it's an essential tool for data scientists. diff --git a/docs/en/7-MLOps/Machine-Learning-Training-Pipelines.md b/docs/en/7-MLOps/Machine-Learning-Training-Pipelines.md index 3040e84a5..150e9a0b9 100644 --- a/docs/en/7-MLOps/Machine-Learning-Training-Pipelines.md +++ b/docs/en/7-MLOps/Machine-Learning-Training-Pipelines.md @@ -388,44 +388,10 @@ With the data split, you can now define and train your machine learning model us After training the model, you need to evaluate its performance on the testing set. This will give you an idea of how well the model will perform on new, unseen data. -=== "Python" - ``` py title="evaluate.py" linenums="1" - - ``` -=== "R" - ``` r title="evaluate.R" linenums="1" - - ``` -=== "SASPy" - ``` py title="evaluate.py" linenums="1" - - ``` -=== "SAS" - ``` sas title="evaluate.sas" linenums="1" - - ``` - #### 6. Deploy the model Finally, you can deploy the trained machine learning model in a production environment. -=== "Python" - ``` py title="deploy.py" linenums="1" - - ``` -=== "R" - ``` r title="deploy.R" linenums="1" - - ``` -=== "SASPy" - ``` py title="deploy.py" linenums="1" - - ``` -=== "SAS" - ``` sas title="deploy.sas" linenums="1" - - ``` - ### Using Argo Workflows ![Argo Workflows](../images/argo-workflows-assembly-line.jpg) diff --git a/docs/en/Help.md b/docs/en/Help.md index 22f8f63f0..9a4191de4 100644 --- a/docs/en/Help.md +++ b/docs/en/Help.md @@ -10,13 +10,13 @@ channel. You can ask questions and provide feedback there. We will also post notices there if there are updates or downtime. -# Video tutorials + # GitHub diff --git a/docs/en/images/RStudioOption.PNG b/docs/en/images/RStudioOption.PNG index 97a7b2939..e87951401 100644 Binary files a/docs/en/images/RStudioOption.PNG and b/docs/en/images/RStudioOption.PNG differ diff --git a/docs/en/images/RemoteDesktop.PNG b/docs/en/images/RemoteDesktop.PNG index a79c5a1d7..b94616465 100644 Binary files a/docs/en/images/RemoteDesktop.PNG and b/docs/en/images/RemoteDesktop.PNG differ diff --git a/docs/en/images/kubeflow_contributors.png b/docs/en/images/kubeflow_contributors.png index 2f43a4997..1c30d3318 100644 Binary files a/docs/en/images/kubeflow_contributors.png and b/docs/en/images/kubeflow_contributors.png differ diff --git a/docs/en/images/kubeflow_delete_disk.png b/docs/en/images/kubeflow_delete_disk.png index 0cbc9a063..ab85587df 100644 Binary files a/docs/en/images/kubeflow_delete_disk.png and b/docs/en/images/kubeflow_delete_disk.png differ diff --git a/docs/en/images/kubeflow_existing_volume.png b/docs/en/images/kubeflow_existing_volume.png index c107e2b6f..325431053 100644 Binary files a/docs/en/images/kubeflow_existing_volume.png and b/docs/en/images/kubeflow_existing_volume.png differ diff --git a/docs/en/images/kubeflow_volumes_disk.png b/docs/en/images/kubeflow_volumes_disk.png new file mode 100644 index 000000000..9ecef9433 Binary files /dev/null and b/docs/en/images/kubeflow_volumes_disk.png differ diff --git a/docs/fr/images/kubeflow_contributors.png b/docs/fr/images/kubeflow_contributors.png index 2f43a4997..11d98855d 100644 Binary files a/docs/fr/images/kubeflow_contributors.png and b/docs/fr/images/kubeflow_contributors.png differ diff --git a/docs/fr/images/kubeflow_delete_disk.png b/docs/fr/images/kubeflow_delete_disk.png index 0cbc9a063..807981fc2 100644 Binary files a/docs/fr/images/kubeflow_delete_disk.png and b/docs/fr/images/kubeflow_delete_disk.png differ