Skip to content

Latest commit

 

History

History
382 lines (298 loc) · 25.8 KB

README.md

File metadata and controls

382 lines (298 loc) · 25.8 KB

Starway to Orione CUDA (Cloud Utilities and Data Analytics) - TEAM

DoAI is the Development Operation and Artificial Intelligence

This is the learning path every new Cloud Data Architect has to follow when joining the XPeppers Cloud team. This path reflects our team's culture and values, which have their roots in the agile values and principles.

Table of Contents

Table of Contents
  1. Programming
  2. Maths & Statistics
  3. Toolboxes
  4. Machine Learning
  5. Deep Learning
  6. Cloud Operation and DevOps
  7. Machine Learning Operations (MLOps)
  8. NLP & NLU
  9. Recommendation System
  10. Reinforcement Learning

Please feel free to fork and contribute, add materials, fix the existing ones and propose new stuff.

During all the plan read The Phoenix Project.

Programming

🠥🠥 Back to Table of Contents 🠥🠥

Maths & Statistics

🠥🠥 Back to Table of Contents 🠥🠥

Toolboxes

  • Evaluating and Exploring data in Python

    • Jupyter: Jupyter notebooks with examples of common scientific libraries used in Data Science/ML projects (Numpy, Scipy, Matplotlib,...)

    • Anaconda #optional : open-source distribution of Python packages, IDEs, built for data science.

    • Zeppelin #optional: web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala, Python, R and more.

    • Notebook Alternative #optional: running Jupyter Notebook on VS Code

  • Scientific libraries in Python

    • Numpy: scientific library in Python that provides high-performance multi-dimensional array and tools for working with them.

    • Pandas: open-source Python package useful for the exploration, cleaning and processing of tabular data, called DataFrame.

    • Scipy: collection of mathematical algorithms and convenience functions built on Numpy (useful for Linear Algebra, Signal Processing, Fourier Transforms, Statistics,...)

    • Scikit-learn video - Scikit-learn guide: machine learning library for Python programming language

    • Matplotlib video tutorial - Matplotlib guide: Python library for Data Visualization.

    • Seaborn: Python library for data visualization based on Matplotlib.

    • Dask-ML #optional: ML library based on Dask (library for parallel computing in Python).

    • Apache Spark #optional: Fast and general engine for large-scale data processing (in batches and real-time straming). Spark provides an interface for programming clusters (Distributed Computing) with implicit data parallelism and fault tolerance.

    • Spark & Hadoop Developer #optional

    • Statsmodels #optional

🠥🠥 Back to Table of Contents 🠥🠥

Machine Learning

🠥🠥 Back to Table of Contents 🠥🠥

Deep Learning

  • Dive into Deep Learning

  • Practical Deep Learning for Coders

  • Activation Functions Functions that multiplie the output of a neuron in a Neural Network. Used to apply a desired transformation to the output.

  • Linear Layer

  • CNN (Convolutional Neural Networks): NNs commonly applied to analyze images.

  • RNN (Recurrent Neural Networks): NNs commonly applied to analyze Sequential Data (e.g. text, audio,...)

  • Optimization: Optimization algorithms continuously update model parameters by minimizing the value of the loss function. There are different types of optimization tools: Gradient descent, SGD, Adagrad, Adam, and so on.

  • Loss Functions / Objective Functions: Define an objective which the performance of the model is evaluated against and the parameters learned by the model are determined by minimizing the chosen loss function.

  • Dropout: Regularization method used to reduce the overfitting issue of large neural nets. During training, some number of layer outputs (nodes) are "dropped out". This method approximates training a large number of neural networks with different architectures in parallel.

  • Batchnorm: Regularization technique used to avoid overfitting and moreover improves the learning speed of NN. It normalizes neuron's output before applying the activation function.

  • Learning Rate Scheduler

  • Frameworks:

🠥🠥 Back to Table of Contents 🠥🠥

Cloud Operation and DevOps

  • Machine Learning Lens:

    Well-architected framework helps to learn operational and architectural best practices for designing and operating ML workflow in the cloud.

  • Book: Data Science on AWS

  • AWS Machine Learning Stack

    • ML Frameworks & Infrastructure AWS services, framework and resources to build, train, and deploy machine learning (ML) applications.

    • Amazon SageMaker Fully managed ML service used to quickly and easily build and train ML models and then deploy them into a prediction-ready hosted environment at any scale.

    • AI Services AWS provides several AI services; e.g. Amazon Rekognition that consists in pre-trained and customizable computer vision (CV) capabilities to extract information and insights from your images and videos...

  • More AWS Services:

    • AWS Lambda Serverless computing service that lets you run code in highly available infrastructure without provisioning or managing servers.

    • CI/CD CodePipeline Continuous delivery service you can use to model, visualize, and automate the steps required to release your software.

    • Step Functions Serverless orchestration service that lets you combine AWS Lambda functions and other AWS services to build business-critical applications. With Step Functions you examine the state of each step in your workflow to make sure that your application runs in order and as expected.

    • Elastic File System Serverless elastic file system for use with AWS Cloud services and on-premises resources.

    • Fargate Service that provisions serverless compute resources to run AWS ECS and EKS containers.

    • AWS Batch Helps you to run batch computing workloads (way to access large amounts of compute resources) on the AWS Cloud across multiple Availability Zones within a Region.

  • Utils:

    • Book: Python for DevOps: suggested book regarding how to use Python for everyday Linux systems administration tasks with today's most useful DevOps tools, including Docker, Kubernetes, and Terraform.

    • Boto3 : AWS SDK for Python to create, configure, and manage AWS services, such as Amazon EC2 and Amazon S3.

    • CloudFormation Easy way to create a collection of related AWS resources and provision them in an orderly and predictable fashion. Allows you to model your entire infrastructure in a text file (JSON or YAML) called a template.

    • SAM Framework for building serverless application. It provides shorthand syntax to express functions, APIs, databases, and event source mappings. You can define the application you want and model it using a JSON or YAML configuration template.

    • Data Science SDK With this library you can create workflows that process and publish machine learning models using SageMaker and Step Functions. Data Science SDK provides a Python API that can create and invoke Step Functions workflows.

    • Scala Data Quality | Python Data Quality Python API for Deequ, a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets. 

    • Book: Terraform #optional Hands-on book exploring Terraform, an Infrastructure as code tool for defining, launching, and managing infrastructure as code (IaC) across a variety of cloud and virtualization platforms.

    • Troposhpere #optional Library that makes easier the creation of the AWS CloudFormation JSON by writing Python code to describe the AWS resources.

    • Goformation #optional Go library for working with AWS CloudFormation / AWS Serverless Application Model (SAM) templates.

    • Aws Data Wrangle #optional : Extends the power of Pandas library to AWS connecting DataFrames and AWS data related services.

  • Training and Certification

  • Competitive: Starway to Orione Cloud

🠥🠥 Back to Table of Contents 🠥🠥

Machine Learning Operations (MLOps)

🠥🠥 Back to Table of Contents 🠥🠥

Machine Learning Applications

A more specific section about some Machine Learning Fields.

NLP & NLU

🠥🠥 Back to Table of Contents 🠥🠥

Recommendation System

🠥🠥 Back to Table of Contents 🠥🠥

Reinforcement Learning

🠥🠥 Back to Table of Contents 🠥🠥