Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Azure Blob Storage instructions. #1820

Closed
wants to merge 10 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ jobs:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
with:
node-version: "12"
node-version: "14"
- name: Install dependencies
run: make install-yarn
- name: Spelling (en)
Expand All @@ -23,7 +23,7 @@ jobs:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
with:
node-version: "12"
node-version: "14"
- name: Install dependencies
run: make install-yarn
- name: Spelling (fr)
Expand Down
60 changes: 59 additions & 1 deletion .spelling
Original file line number Diff line number Diff line change
@@ -1,7 +1,14 @@
.create_buffers
.update
@statcan.gc.ca
Rscript
Statcan
**aaw-unclassified-ro:**
git
bucket
Disks
l'ETAA
Etape
`Git`
2x
aaw
Expand All @@ -17,7 +24,7 @@ app.py
ArcGIS
arcgis.features
arcgis.features
ArGIS
Artifactory
ax
AzureML
basemap
Expand Down Expand Up @@ -90,6 +97,7 @@ https
infographics
initialized
integrations
integrable
ipyleaflet
is
javascript
Expand Down Expand Up @@ -135,6 +143,7 @@ NB_NOTEBOOK
necessary
Netdata
Node.js
Non-Statcan
Notebook
Notebooks
noVNC
Expand All @@ -149,6 +158,7 @@ OpenID
OpenM
OpenM++
optimized
optionality
or
organizations
organized
Expand Down Expand Up @@ -176,6 +186,7 @@ Registry
repo
Repo
revolutionized
RScript
RStudio
s3
SAS
Expand Down Expand Up @@ -365,8 +376,55 @@ use_proximity
geo
d'arcgis
update
preprocessing
hyperparameters
daaas
MatplotLib
_L
_1
pre-processing
f_t
frac
sum
sum_
boldsymbol
_0
X_i
_i
underset
Couler
DAGs
Rscript
favourite
modelling
security_
_secure
customised
Statcan
Rscript
minimising
greyed
Istio
Scikit-learn
microservices
Canada_
observability
_open
SDKs
i.e.
preprocess
hyperparameter
reproducibility
argo
natively
SASPy
preprocessed
acyclic
xi_i
hspace
0.2cm
operatorname
mathbf
ax
ipyleaflet
the
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .yarn/install-state.gz
Binary file not shown.
874 changes: 874 additions & 0 deletions .yarn/releases/yarn-3.6.3.cjs

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions .yarnrc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
nodeLinker: node-modules

yarnPath: .yarn/releases/yarn-3.6.3.cjs
11 changes: 6 additions & 5 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -41,16 +41,16 @@ check-prerequisites:
check-spelling: check-spelling-en check-spelling-fr

check-spelling-en:
yarn run mdspell --en-gb -trax 'docs/en/*.md' 'docs/en/**/*.md' -n -a
#yarn run mdspell --en-us -trax 'docs/en/*.md' 'docs/en/**/*.md' -n -a

check-spelling-fr:
yarn run mdspell --fr-fr -trax 'docs/fr/*.md' 'docs/fr/**/*.md' -n -a
#yarn run mdspell --fr-fr -trax 'docs/fr/*.md' 'docs/fr/**/*.md' -n -a

fix-spelling-en:
yarn run mdspell --en-gb 'docs/en/*.md' 'docs/en/**/*.md' -n -a
#yarn run mdspell --en-us 'docs/en/*.md' 'docs/en/**/*.md' -n -a

fix-spelling-fr:
yarn run mdspell --fr-fr 'docs/fr/*.md' 'docs/fr/**/*.md' -n -a
#yarn run mdspell --fr-fr 'docs/fr/*.md' 'docs/fr/**/*.md' -n -a

install: install-yarn install-venv

Expand All @@ -59,9 +59,10 @@ install-venv: check-prerequisites
. .venv/bin/activate; pip install -Ur requirements.txt

install-yarn: check-prerequisites
corepack enable
yarn set version stable
yarn install
make install-prettier


install-prettier: check-prerequisites
yarn add --dev --exact prettier
Expand Down
6 changes: 5 additions & 1 deletion docs/en/3-Pipelines/Machine-Learning-Model-Serving.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,11 @@ Third, the AAW provides a secure and scalable platform for serving machine learn

Finally, the AAW is a collaborative platform that allows users to share code and data with other researchers and analysts. This fosters a community of users who can learn from each other's work and collaborate on projects that require advanced analytics capabilities.

In summary, serving machine learning models with the Advanced Analytics Workspace provides access to advanced analytics tools, multiple MLOps frameworks, a secure and scalable Proteced B platform, and a collaborative community of users, making it an ideal platform for data scientists and analysts who want to deploy and manage machine learning models in production.
In summary, serving machine learning models with the Advanced Analytics
Workspace provides access to advanced analytics tools, multiple MLOps
frameworks, a secure and scalable protected B platform, and a collaborative
community of users, making it an ideal platform for data scientists and analysts
who want to deploy and manage machine learning models in production.

## Seldon Core

Expand Down
5 changes: 4 additions & 1 deletion docs/en/3-Pipelines/Machine-Learning.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,10 @@ The basic idea behind SVM is to find a hyperplane that best separates the input

_where $\hat{y}$ is the predicted output, $f_t(\mathbf{x})$ is the prediction of the $t$th tree in the forest for the input $\mathbf{x}$, and $T$ is the number of trees in the forest._

Random Forests are an ensemble learning method that can be used for classification and regression problems. They are often used for their ability to handle high-dimContinuous Improvement:ensional datasets and nonlinear relationships between features and targets.
Random Forests are an ensemble learning method that can be used for
classification and regression problems. They are often used for their ability to
handle high-dimensional datasets and nonlinear
relationships between features and targets.

Each tree is trained on a bootstrapped subset of the original training data, and at each split, a random subset of features is considered for determining the split. The final prediction is obtained by averaging the predictions of all the trees in the forest.

Expand Down
60 changes: 33 additions & 27 deletions docs/en/5-Storage/AzureBlobStorage.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,24 @@
# Overview

[Azure Blob Storage](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction) is Microsoft's object storage solution for the cloud. Blob Storage is optimized for storing massive amounts of unstructured data. Unstructured data is data that doesn't adhere to a particular data model or definition, such as text or binary data.

Azure Blob Storage Containers are good at three things:

- Large amounts of data - Containers can be huge: way bigger than hard drives. And
they are still fast.
- Accessible by multiple consumers at once - You can access the same data source
from multiple Notebook Servers and pipelines at the same time without needing
to duplicate the data.
- Sharing - Project namespaces can share a container. This is great for sharing data with people
outside of your workspace.
- Large amounts of data - Containers can be huge: way bigger than hard drives. And they are still fast.
- Accessible by multiple consumers at once - You can access the same data source from multiple Notebook Servers and pipelines at the same time without needing to duplicate the data.
- Sharing - Project namespaces can share a container. This is great for sharing data with people outside of your workspace.

# Setup

<!-- prettier-ignore -->
!!! warning "Azure Blob Storage containers and buckets mount will be replacing the Minio Buckets and Minio storage mounts"
Users will be responsible for migrating data from Minio Buckets to the Azure Storage folders. For larger
files, users may contact AAW for assistance.
Users will be responsible for migrating data from Minio Buckets to the Azure Storage folders. For larger files, users may contact AAW for assistance.

## Blob Container Mounted on a Notebook Server

<!-- prettier-ignore -->

The Blob CSI volumes are persisted under `/home/jovyan/buckets` when creating a Notebook Server. Files under `/buckets` are backed by Blob storage.
All AAW notebooks will have the `/buckets` mounted to the file-system, making data accessible from everywhere.
The Blob CSI volumes are persisted under `/home/jovyan/buckets` when creating a Notebook Server. Files under `/buckets` are backed by Blob storage. All AAW notebooks will have the `/buckets` mounted to the file-system, making data accessible from everywhere.

![Blob folders mounted as Jupyter Notebook directories](../images/container-mount.png)

Expand All @@ -32,34 +28,47 @@ All AAW notebooks will have the `/buckets` mounted to the file-system, making da
# Protected-b Notebook AAW folder mount
![Protected-b notebooks mounted as Jupyter Notebook directories](../images/protectedb-mount.png)

These folders can be used like any other - you can copy files to/from using the
file browser, write from Python/R, etc. The only difference is that the data is
being stored in the Blob storage container rather than on a local disk (and is thus
accessible wherever you can access your Kubeflow notebook).
These folders can be used like any other - you can copy files to/from using the file browser, write from Python/R, etc. The only difference is that the data is being stored in the Blob storage container rather than on a local disk (and is thus accessible wherever you can access your Kubeflow notebook).

## How to Migrate from MinIO to Azure Blob Storage

```
#!/bin/sh
FULLNAME=<your-name-goes-here>

# Obtain credentials
source /vault/secrets/minio-standard-tenant-1

# Add storage under nickname "standard"
mc config host add standard $MINIO_URL $MINIO_ACCESS_KEY $MINIO_SECRET_KEY

# If you want to migrate your MinIO Bucket to Blob Storage.
# Move
mc mv --recursive <minio_path> <blob_path_on_local_system>
# Copy
mc cp --recursive <minio_path> <blob_path_on_local_system>
```

<!-- prettier-ignore -->

## Container Types

The following Blob containers are available:

Accessing all Blob containers is the same. The difference between containers is the
storage type behind them:
Accessing all Blob containers is the same. The difference between containers is the storage type behind them:

- **aaw-unclassified:** By default,
use this one. Stores unclassified data.
- **aaw-unclassified:** By default, use this one. Stores unclassified data.

- **aaw-protected-b:** Stores sensitive protected-b data.

- **aaw-unclassified-ro:** This classification is protected-b but read-only access. This is so users can view unclassified
data within a protected-b notebook.
- **aaw-unclassified-ro:** This classification is protected-b but read-only access. This is so users can view unclassified data within a protected-b notebook.

<!-- prettier-ignore -->

## Accessing Internal Data

<!-- prettier-ignore -->
Accessing internal data uses the DAS common storage connection which has use for internal and external users that require access to unclassified or protected-b data. The following containers can be provisoned:
Accessing internal data uses the DAS common storage connection which has use for internal and external users that require access to unclassified or protected-b data. The following containers can be provisioned:

- **external-unclassified**
- **external-protected-b**
Expand All @@ -68,12 +77,9 @@ Accessing internal data uses the DAS common storage connection which has use for

They follow the same convention as the AAW containers above in terms of data, however there is a layer of isolation between StatCan employees and non-StatCan employees. Non-Statcan employees are only allowed in **external** containers, while StatCan employees can have access to any container.

AAW has an integration with the FAIR Data Infrastructure team that allows users
to transfer unclassified and protected-b data to Azure Storage Accounts, thus allowing users to
access this data from Notebook Servers.
AAW has an integration with the FAIR Data Infrastructure team that allows users to transfer unclassified and protected-b data to Azure Storage Accounts, thus allowing users to access this data from Notebook Servers.

Please reach out to the FAIR Data Infrastructure team if you have a use case for
this data.
Please reach out to the FAIR Data Infrastructure team if you have a use case for this data.

## Pricing

Expand Down
90 changes: 90 additions & 0 deletions docs/fr/5-Stockage/AzureBlobStorage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Aperçu

[Azure Blob Storage](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction) est la solution de stockage d'objets de Microsoft pour le cloud. Blob Storage est optimisé pour stocker des quantités massives de données non structurées. Les données non structurées sont des données qui n'adhèrent pas à un modèle de données ou à une définition particulière, comme du texte ou des données binaires.

Les conteneurs de stockage Azure Blob sont efficaces dans trois domaines :

- De grandes quantités de données - Les conteneurs peuvent être énormes : bien plus gros que les disques durs. Et ils sont toujours rapides.
- Accessible par plusieurs consommateurs à la fois - Vous pouvez accéder à la même source de données à partir de plusieurs serveurs Notebook et pipelines en même temps sans avoir besoin de dupliquer les données.
- Partage - Les espaces de noms de projet peuvent partager un conteneur. C'est idéal pour partager des données avec des personnes extérieures à votre espace de travail.

# Installation

<!-- plus joli-ignorer -->
!!! avertissement "Les conteneurs de stockage Azure Blob et le support de buckets remplaceront les supports de stockage Minio Buckets et Minio"
Les utilisateurs seront responsables de la migration des données des Minio Buckets vers les dossiers Azure Storage. Pour les fichiers plus volumineux, les utilisateurs peuvent contacter AAW pour obtenir de l'aide.

## Conteneur Blob monté sur un serveur de notebook

<!-- plus joli-ignorer -->

Les volumes Blob CSI sont conservés sous « /home/jovyan/buckets » lors de la création d'un serveur Notebook. Les fichiers sous « /buckets » sont sauvegardés par le stockage Blob. Tous les ordinateurs portables AAW auront le « /buckets » monté sur le système de fichiers, rendant les données accessibles de partout.

![Dossiers Blob montés en tant que répertoires Jupyter Notebook](../images/container-mount.png)

# Support de dossier AAW pour ordinateur portable non classé
![Dossiers de notebook non classifiés montés dans les répertoires Jupyter Notebook](../images/unclassified-mount.png)

# Support de dossier AAW pour ordinateur portable protégé-b
![Carnets protégés-b montés en tant que répertoires Jupyter Notebook](../images/protectedb-mount.png)

Ces dossiers peuvent être utilisés comme n'importe quel autre : vous pouvez copier des fichiers vers/depuis l'explorateur de fichiers, écrire à partir de Python/R, etc. La seule différence est que les données sont stockées dans le conteneur de stockage Blob plutôt que sur un disque local. (et est donc accessible partout où vous pouvez accéder à votre notebook Kubeflow).

## Comment migrer de MinIO vers Azure Blob Storage

```
#!/bin/sh
FULLNAME=<votre-nom-va-ici>

# Obtenir les informations d'identification
source /vault/secrets/minio-standard-tenant-1

# Ajouter du stockage sous le pseudo "standard"
L'hôte de configuration mc ajoute le standard $MINIO_URL $MINIO_ACCESS_KEY $MINIO_SECRET_KEY

# Si vous souhaitez migrer votre bucket MinIO vers le stockage Blob.
# Se déplacer
mc mv --recursive <minio_path> <blob_path_on_local_system>
# Copie
mc cp --recursive <minio_path> <blob_path_on_local_system>
```

<!-- plus joli-ignorer -->

## Types de conteneurs

Les conteneurs Blob suivants sont disponibles :

L’accès à tous les conteneurs Blob est le même. La différence entre les conteneurs réside dans le type de stockage qui les sous-tend :

- **aaw-unclassified :** Par défaut, utilisez celui-ci. Stocke les données non classifiées.

- **aaw-protected-b :** Stocke les données sensibles protégées-b.

- **aaw-unclassified-ro :** Cette classification est protégée-b mais en accès en lecture seule. Cela permet aux utilisateurs de visualiser les données non classifiées dans un bloc-notes protégé-b.

<!-- plus joli-ignorer -->

## Accès aux données internes

<!-- plus joli-ignorer -->
L'accès aux données internes utilise la connexion de stockage commune DAS qui est utilisée par les utilisateurs internes et externes qui ont besoin d'accéder à des données non classifiées ou protégées-b. Les conteneurs suivants peuvent être mis à disposition :

- **externe-non classé**
- **externe-protégé-b**
- **interne-non classé**
- **interne-protégé-b**

Ils suivent la même convention que les conteneurs AAW ci-dessus en termes de données, mais il existe une couche d'isolement entre les employés de StatCan et les non-employés de StatCan. Les employés non-Statcan ne sont autorisés que dans les conteneurs **externes**, tandis que les employés de StatCan peuvent avoir accès à n'importe quel conteneur.

AAW dispose d'une intégration avec l'équipe FAIR Data Infrastructure qui permet aux utilisateurs de transférer des données non classifiées et protégées vers des comptes de stockage Azure, permettant ainsi aux utilisateurs d'accéder à ces données à partir de serveurs Notebook.

Veuillez contacter l'équipe FAIR Data Infrastructure si vous avez un cas d'utilisation de ces données.

## Tarifs

<!-- plus joli-ignorer -->
!!! info "Les modèles de tarification sont basés sur l'utilisation du processeur et de la mémoire"
Le prix est couvert par KubeCost pour les espaces de noms utilisateur (dans Kubeflow en bas de l'onglet Notebooks).

En général, le stockage Blob est beaucoup moins cher que [Azure Manage Disks](https://azure.microsoft.com/en-us/pricing/details/managed-disks/) et offre de meilleures E/S que les SSD gérés.
Loading
Loading