DoAI is the Development Operation and Artificial Intelligence
This is the learning path every new Cloud Data Architect has to follow when joining the XPeppers Cloud team. This path reflects our team's culture and values, which have their roots in the agile values and principles.
- Flat Organizations:
- Read chapters 1, 4, 5, 7 of XP Explained
#onboarding
- Read chapters 2, 6 of XP Explained
- Iterative and Incremental Development:
- Waterfall
#onboarding
- Agile
#onboarding
- Transition
- Waterfall
- For italian speakers, Watch "Perché è così difficile fare Extreme Programming" by Matteo Vaccari
#onboarding
- Pair Programming
#onboarding
- Agile Mindset:
- What Exactly is the Agile Mindset?
#onboarding
- What Exactly is the Agile Mindset?
- Read The Pomodoro Technique paper
- Read first chapter of "Applying UML and Patterns"
- Try to estimate the time needed to study that chapter (using the pomodoro technique)
- Answer (for example on the team's wiki pages)
- What is analysis?
- What is design?
- What's the difference between them?
- What is design for?
- in other words, how would you reply to the following statement: "I just need to understand what to do (analysis) and then do it (coding). Everything else does not matter!"
Table of Contents
Please feel free to fork and contribute, add materials, fix the existing ones and propose new stuff.
During all the plan read The Phoenix Project.
- Python course by Analytics Vidhya
- Basic Python - Option 2: Basic Python
- Object-oriented Programming
- Python Regular Expressions
- RE Cheat Sheet
- Git - Git Cheat Sheet: open source distributed version control system.
- Shell Script
- Competitive: Starway to Orione Dev
🠥🠥 Back to Table of Contents 🠥🠥
- Linear Algebra
- Calculus
- Descriptive Statistics
- Data Distributions
- Convolutions
- Exploratory Data Analysis
- Regression
🠥🠥 Back to Table of Contents 🠥🠥
-
Evaluating and Exploring data in Python
-
Jupyter: Jupyter notebooks with examples of common scientific libraries used in Data Science/ML projects (Numpy, Scipy, Matplotlib,...)
-
Anaconda
#optional
: open-source distribution of Python packages, IDEs, built for data science. -
Zeppelin
#optional
: web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala, Python, R and more. -
Notebook Alternative
#optional
: running Jupyter Notebook on VS Code
-
-
Scientific libraries in Python
-
Numpy: scientific library in Python that provides high-performance multi-dimensional array and tools for working with them.
-
Pandas: open-source Python package useful for the exploration, cleaning and processing of tabular data, called
DataFrame
. -
Scipy: collection of mathematical algorithms and convenience functions built on
Numpy
(useful for Linear Algebra, Signal Processing, Fourier Transforms, Statistics,...) -
Scikit-learn video - Scikit-learn guide: machine learning library for Python programming language
-
Matplotlib video tutorial - Matplotlib guide: Python library for Data Visualization.
-
Seaborn: Python library for data visualization based on
Matplotlib
. -
Dask-ML
#optional
: ML library based onDask
(library for parallel computing in Python). -
Apache Spark
#optional
: Fast and general engine for large-scale data processing (in batches and real-time straming). Spark provides an interface for programming clusters (Distributed Computing) with implicit data parallelism and fault tolerance. -
Spark & Hadoop Developer
#optional
-
Statsmodels
#optional
-
🠥🠥 Back to Table of Contents 🠥🠥
-
Elements of AI: An Introduction to AI with no complicated math or programming required.
-
Book: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
-
Interpretable Machine Learning
#optional
-
Blog & Articles:
-
Workflow of a Machine Learning project
Gathering data
$\rightarrow$ Data pre-processing$\rightarrow$ choose the model$\rightarrow$ train and test model$\rightarrow$ Evaluation - Machine Learning for Everyone: High level explanation of ML topics (e.g. Supervised, Unsupervised Learning algorithms, Neural networks,... )
- Introduction to Data Visualization in Python: Overview of most common tools for data visualization (Matplotlib, Seaborn,Plotly, Pandas Visualization)
- Machine learning in Python with Scikit-learn: Binary Classification example using scikit-learn Random Forest classifier
-
Workflow of a Machine Learning project
Gathering data
-
Algorithms
-
Supervised Learning: when each training observation from the dataset has a corresponding label or output value associated with it.
-
Unsupervised Learning: when the training data has no labels.
-
SageMaker Built-in : built-in machine learning algorithms provided by SageMaker
-
-
Evaluation Metrics
-
Classification task: the value of the target variable to predict is discrete
-
Regression task: the value of the target variable to predict is continuous
-
Information Retrieval System (e.g. Recommendation Systems)
-
Images
-
-
Homeworks
#optional
-
Extra:
🠥🠥 Back to Table of Contents 🠥🠥
-
Activation Functions Functions that multiplie the output of a neuron in a Neural Network. Used to apply a desired transformation to the output.
-
CNN (Convolutional Neural Networks): NNs commonly applied to analyze images.
-
RNN (Recurrent Neural Networks): NNs commonly applied to analyze Sequential Data (e.g. text, audio,...)
-
Optimization: Optimization algorithms continuously update model parameters by minimizing the value of the loss function. There are different types of optimization tools: Gradient descent, SGD, Adagrad, Adam, and so on.
-
Loss Functions / Objective Functions: Define an objective which the performance of the model is evaluated against and the parameters learned by the model are determined by minimizing the chosen loss function.
-
Dropout: Regularization method used to reduce the overfitting issue of large neural nets. During training, some number of layer outputs (nodes) are "dropped out". This method approximates training a large number of neural networks with different architectures in parallel.
-
Batchnorm: Regularization technique used to avoid overfitting and moreover improves the learning speed of NN. It normalizes neuron's output before applying the activation function.
-
Frameworks:
🠥🠥 Back to Table of Contents 🠥🠥
-
Well-architected framework helps to learn operational and architectural best practices for designing and operating ML workflow in the cloud.
-
AWS Machine Learning Stack
-
ML Frameworks & Infrastructure AWS services, framework and resources to build, train, and deploy machine learning (ML) applications.
-
Amazon SageMaker Fully managed ML service used to quickly and easily build and train ML models and then deploy them into a prediction-ready hosted environment at any scale.
-
AI Services AWS provides several AI services; e.g. Amazon Rekognition that consists in pre-trained and customizable computer vision (CV) capabilities to extract information and insights from your images and videos...
-
-
More AWS Services:
-
AWS Lambda Serverless computing service that lets you run code in highly available infrastructure without provisioning or managing servers.
-
CI/CD CodePipeline Continuous delivery service you can use to model, visualize, and automate the steps required to release your software.
-
Step Functions Serverless orchestration service that lets you combine AWS Lambda functions and other AWS services to build business-critical applications. With Step Functions you examine the state of each step in your workflow to make sure that your application runs in order and as expected.
-
Elastic File System Serverless elastic file system for use with AWS Cloud services and on-premises resources.
-
Fargate Service that provisions serverless compute resources to run AWS ECS and EKS containers.
-
AWS Batch Helps you to run batch computing workloads (way to access large amounts of compute resources) on the AWS Cloud across multiple Availability Zones within a Region.
-
-
Utils:
-
Book: Python for DevOps: suggested book regarding how to use Python for everyday Linux systems administration tasks with today's most useful DevOps tools, including Docker, Kubernetes, and Terraform.
-
Boto3 : AWS SDK for Python to create, configure, and manage AWS services, such as Amazon EC2 and Amazon S3.
-
CloudFormation Easy way to create a collection of related AWS resources and provision them in an orderly and predictable fashion. Allows you to model your entire infrastructure in a text file (JSON or YAML) called a template.
-
SAM Framework for building serverless application. It provides shorthand syntax to express functions, APIs, databases, and event source mappings. You can define the application you want and model it using a JSON or YAML configuration template.
-
Data Science SDK With this library you can create workflows that process and publish machine learning models using SageMaker and Step Functions. Data Science SDK provides a Python API that can create and invoke Step Functions workflows.
-
Scala Data Quality | Python Data Quality Python API for Deequ, a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
-
Book: Terraform
#optional
Hands-on book exploring Terraform, an Infrastructure as code tool for defining, launching, and managing infrastructure as code (IaC) across a variety of cloud and virtualization platforms. -
Troposhpere
#optional
Library that makes easier the creation of the AWS CloudFormation JSON by writing Python code to describe the AWS resources. -
Goformation
#optional
Go library for working with AWS CloudFormation / AWS Serverless Application Model (SAM) templates. -
Aws Data Wrangle
#optional
: Extends the power of Pandas library to AWS connecting DataFrames and AWS data related services.
-
-
Training and Certification
-
Competitive: Starway to Orione Cloud
🠥🠥 Back to Table of Contents 🠥🠥
- What is MLOps
- MLOps Overview
- Feature Store:
- Version Control System for Machine Learning
- SageMaker MLOps
- DevOps for Machine Learning
- Workshop:
- Competitive: Awesome MLOps
🠥🠥 Back to Table of Contents 🠥🠥
A more specific section about some Machine Learning Fields.
- Introduction to NLP
- Reading Comprehension
- Question Answering on the SQuAD Dataset
- BERT Explained
- Transformers
- SpaCy
- Learning to Rank
- Natural Language Processing with Python
- Unsupervised Translation of Programming Languages
- Perplexity and BLEU Score metrics
🠥🠥 Back to Table of Contents 🠥🠥
- What Are Recommender Systems
- Matrix Factorization
- How does Netflix recommend movies using Matrix Factorization
- Collaborative filtering
🠥🠥 Back to Table of Contents 🠥🠥
- Reinforcement Learning Introduction
- Videos:
- Reinforcement Learning on AWS: