Starway to Orione CUDA (Cloud Utilities and Data Analytics) - TEAM

DoAI is the Development Operation and Artificial Intelligence

This is the learning path every new Cloud Data Architect has to follow when joining the XPeppers Cloud team. This path reflects our team's culture and values, which have their roots in the agile values and principles.

Flat Organizations:
Read chapters 1, 4, 5, 7 of XP Explained #onboarding
Read chapters 2, 6 of XP Explained
Iterative and Incremental Development:
- Waterfall #onboarding
- Agile #onboarding
- Transition
For italian speakers, Watch "Perché è così difficile fare Extreme Programming" by Matteo Vaccari #onboarding
Pair Programming #onboarding
Agile Mindset:
- What Exactly is the Agile Mindset? #onboarding
Read The Pomodoro Technique paper
Read first chapter of "Applying UML and Patterns"
- Try to estimate the time needed to study that chapter (using the pomodoro technique)
- Answer (for example on the team's wiki pages)
  - What is analysis?
  - What is design?
  - What's the difference between them?
  - What is design for?
    - in other words, how would you reply to the following statement: "I just need to understand what to do (analysis) and then do it (coding). Everything else does not matter!"

Evaluating and Exploring data in Python
- Jupyter: Jupyter notebooks with examples of common scientific libraries used in Data Science/ML projects (Numpy, Scipy, Matplotlib,...)
- Anaconda #optional : open-source distribution of Python packages, IDEs, built for data science.
- Zeppelin #optional: web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala, Python, R and more.
- Notebook Alternative #optional: running Jupyter Notebook on VS Code
Scientific libraries in Python
- Numpy: scientific library in Python that provides high-performance multi-dimensional array and tools for working with them.
- Pandas: open-source Python package useful for the exploration, cleaning and processing of tabular data, called DataFrame.
- Scipy: collection of mathematical algorithms and convenience functions built on Numpy (useful for Linear Algebra, Signal Processing, Fourier Transforms, Statistics,...)
- Scikit-learn video - Scikit-learn guide: machine learning library for Python programming language
- Matplotlib video tutorial - Matplotlib guide: Python library for Data Visualization.
- Seaborn: Python library for data visualization based on Matplotlib.
- Dask-ML #optional: ML library based on Dask (library for parallel computing in Python).
- Apache Spark #optional: Fast and general engine for large-scale data processing (in batches and real-time straming). Spark provides an interface for programming clusters (Distributed Computing) with implicit data parallelism and fault tolerance.
- Spark & Hadoop Developer #optional
- Statsmodels #optional

🠥🠥 Back to Table of Contents 🠥🠥

Machine Learning

Elements of AI: An Introduction to AI with no complicated math or programming required.
Google Crash Course
Book: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
Interpretable Machine Learning #optional
Blog & Articles:
- Workflow of a Machine Learning project Gathering data $\rightarrow$ Data pre-processing $\rightarrow$ choose the model $\rightarrow$ train and test model $\rightarrow$ Evaluation
- Machine Learning for Everyone: High level explanation of ML topics (e.g. Supervised, Unsupervised Learning algorithms, Neural networks,... )
- Introduction to Data Visualization in Python: Overview of most common tools for data visualization (Matplotlib, Seaborn,Plotly, Pandas Visualization)
- Machine learning in Python with Scikit-learn: Binary Classification example using scikit-learn Random Forest classifier
Algorithms
- Supervised Learning: when each training observation from the dataset has a corresponding label or output value associated with it.
  - Naive Bayes, Support Vector Machine
  - Tree-based models:
- Unsupervised Learning: when the training data has no labels.
  - K-Means Clustering
- SageMaker Built-in : built-in machine learning algorithms provided by SageMaker
- SageMaker Examples
Evaluation Metrics
- Classification task: the value of the target variable to predict is discrete
  - Confusion Matrix, Accuracy, Precision, Recall,F Score
  - ROC, ROC AUC, Hamming Loss, Top K Accuracy
- Regression task: the value of the target variable to predict is continuous
  - MAE, MSE
- Information Retrieval System (e.g. Recommendation Systems)
  - MRR, DCG, NDCG
- Images
  - PSNR, SSIM, IoU
Homeworks #optional
Extra:
- Machine Learning, Data Science and Deep Learning with Python #optional
- Standardization in case of real-time predictions #optional

🠥🠥 Back to Table of Contents 🠥🠥

Deep Learning

Dive into Deep Learning
Practical Deep Learning for Coders
Activation Functions Functions that multiplie the output of a neuron in a Neural Network. Used to apply a desired transformation to the output.
Linear Layer
CNN (Convolutional Neural Networks): NNs commonly applied to analyze images.
RNN (Recurrent Neural Networks): NNs commonly applied to analyze Sequential Data (e.g. text, audio,...)
Optimization: Optimization algorithms continuously update model parameters by minimizing the value of the loss function. There are different types of optimization tools: Gradient descent, SGD, Adagrad, Adam, and so on.
Loss Functions / Objective Functions: Define an objective which the performance of the model is evaluated against and the parameters learned by the model are determined by minimizing the chosen loss function.
Dropout: Regularization method used to reduce the overfitting issue of large neural nets. During training, some number of layer outputs (nodes) are "dropped out". This method approximates training a large number of neural networks with different architectures in parallel.
Batchnorm: Regularization technique used to avoid overfitting and moreover improves the learning speed of NN. It normalizes neuron's output before applying the activation function.
Learning Rate Scheduler
Frameworks:
- MXNet
- PyTorch
- Tensorflow
- Keras

🠥🠥 Back to Table of Contents 🠥🠥

Cloud Operation and DevOps

Machine Learning Lens:

Well-architected framework helps to learn operational and architectural best practices for designing and operating ML workflow in the cloud.
Book: Data Science on AWS
AWS Machine Learning Stack
- ML Frameworks & Infrastructure AWS services, framework and resources to build, train, and deploy machine learning (ML) applications.
- Amazon SageMaker Fully managed ML service used to quickly and easily build and train ML models and then deploy them into a prediction-ready hosted environment at any scale.
- AI Services AWS provides several AI services; e.g. Amazon Rekognition that consists in pre-trained and customizable computer vision (CV) capabilities to extract information and insights from your images and videos...
More AWS Services:
- AWS Lambda Serverless computing service that lets you run code in highly available infrastructure without provisioning or managing servers.
- CI/CD CodePipeline Continuous delivery service you can use to model, visualize, and automate the steps required to release your software.
- Step Functions Serverless orchestration service that lets you combine AWS Lambda functions and other AWS services to build business-critical applications. With Step Functions you examine the state of each step in your workflow to make sure that your application runs in order and as expected.
- Elastic File System Serverless elastic file system for use with AWS Cloud services and on-premises resources.
- Fargate Service that provisions serverless compute resources to run AWS ECS and EKS containers.
- AWS Batch Helps you to run batch computing workloads (way to access large amounts of compute resources) on the AWS Cloud across multiple Availability Zones within a Region.
Utils:
- Book: Python for DevOps: suggested book regarding how to use Python for everyday Linux systems administration tasks with today's most useful DevOps tools, including Docker, Kubernetes, and Terraform.
- Boto3 : AWS SDK for Python to create, configure, and manage AWS services, such as Amazon EC2 and Amazon S3.
- CloudFormation Easy way to create a collection of related AWS resources and provision them in an orderly and predictable fashion. Allows you to model your entire infrastructure in a text file (JSON or YAML) called a template.
- SAM Framework for building serverless application. It provides shorthand syntax to express functions, APIs, databases, and event source mappings. You can define the application you want and model it using a JSON or YAML configuration template.
- Data Science SDK With this library you can create workflows that process and publish machine learning models using SageMaker and Step Functions. Data Science SDK provides a Python API that can create and invoke Step Functions workflows.
- Scala Data Quality | Python Data Quality Python API for Deequ, a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
- Book: Terraform #optional Hands-on book exploring Terraform, an Infrastructure as code tool for defining, launching, and managing infrastructure as code (IaC) across a variety of cloud and virtualization platforms.
- Troposhpere #optional Library that makes easier the creation of the AWS CloudFormation JSON by writing Python code to describe the AWS resources.
- Goformation #optional Go library for working with AWS CloudFormation / AWS Serverless Application Model (SAM) templates.
- Aws Data Wrangle #optional : Extends the power of Pandas library to AWS connecting DataFrames and AWS data related services.
Training and Certification
Competitive: Starway to Orione Cloud

🠥🠥 Back to Table of Contents 🠥🠥

Machine Learning Operations (MLOps)

What is MLOps
MLOps Overview
Feature Store:
Version Control System for Machine Learning
SageMaker MLOps
DevOps for Machine Learning
Workshop:
- MLOps with SageMaker
Competitive: Awesome MLOps

🠥🠥 Back to Table of Contents 🠥🠥

Machine Learning Applications

A more specific section about some Machine Learning Fields.

NLP & NLU

🠥🠥 Back to Table of Contents 🠥🠥

Recommendation System

🠥🠥 Back to Table of Contents 🠥🠥

Reinforcement Learning

🠥🠥 Back to Table of Contents 🠥🠥

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Starway to Orione CUDA (Cloud Utilities and Data Analytics) - TEAM

Table of Contents

Programming

Maths & Statistics

Toolboxes

Machine Learning

Deep Learning

Cloud Operation and DevOps

Machine Learning Operations (MLOps)

Machine Learning Applications

NLP & NLU

Recommendation System

Reinforcement Learning

About

Contributors 3

xpeppers/starway-to-orione-mlops

Folders and files

Latest commit

History

Repository files navigation

Starway to Orione CUDA (Cloud Utilities and Data Analytics) - TEAM

Table of Contents

Programming

Maths & Statistics

Toolboxes

Machine Learning

Deep Learning

Cloud Operation and DevOps

Machine Learning Operations (MLOps)

Machine Learning Applications

NLP & NLU

Recommendation System

Reinforcement Learning

About

Resources

Stars

Watchers

Forks

Contributors 3