This repo is for "Machine Learning from very basics".
- Pandas
- Numpy
- Scikit-learn
- matplotlib
Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. The most powerful and flexible open source data analysis / manipulation tool available in any language
- Easy handling of missing data
- Columns can be inserted and deleted from DataFrame
- Automatic and explicit data alignment: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels
- Intuitive merging and joining data sets
- Flexible reshaping and pivoting of data sets
- Loading data from flat files (CSV and delimited), Excel files, databases, and saving/loading data
The official documentation is hosted on
NumPy is the fundamental package needed for scientific computing with Python. The ndarray (NumPy Array) is a multidimensional array used to store values of same datatype. These arrays are indexed just like Sequences, starts with zero
- A powerful N-Dimensional array
- Sophisticated Functions
- Mathematical and Logical opertions
- Fourier Transforms
- Linear Algebra, Random Number Generation
The official documentation is hosted on
Simple and efficient tools for data mining and data analysis, The library is built upon the SciPy (Scientific Python) that must be installed before you can use scikit-learn. This stack that includes:
- NumPy: Base n-dimensional array package
- SciPy: Fundamental library for scientific computing
- Matplotlib: Comprehensive 2D/3D plotting
- IPython: Enhanced interactive console
- Sympy: Symbolic mathematics
- Pandas: Data structures and analysis Accessible to everybody, and reusable in various contexts
- Clustering: for grouping unlabeled data such as KMeans.
- Datasets: for test datasets and for generating datasets with specific properties for investigating model behavior.
- Feature extraction: for defining attributes in image and text data.
- Dimensionality Reduction: for reducing the number of attributes in data for summarization, visualization and feature selection such as Principal component analysis.
The official documentation is hosted on
Comprehensive 2D/3D plotting
The official documentation is hosted on