Skip to content

Commit

Permalink
MERGE development PR #48 to master: [DOC] Rename files and direct…
Browse files Browse the repository at this point in the history
…ories; apply a consistent naming convention [IG-1133 IG-10216]
  • Loading branch information
Sharon-iguazio committed Apr 10, 2019
1 parent 407af71 commit ac0834f
Show file tree
Hide file tree
Showing 22 changed files with 82 additions and 88 deletions.
42 changes: 21 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@
- [Deploying Models to Production](#deploying-models-to-production)
- [Visualization, Monitoring, and Logging](#visualization-monitoring-and-logging)
- [End-to-End Use-Case Applications](#end-to-end-use-case-applications)
- [Smart Stock Trading](demos/stocks/read_stocks.ipynb)
- [Smart Stock Trading](demos/stocks/read-stocks.ipynb)
- [Predictive Infrastructure Monitoring](demos/netops/generator.ipynb)
- [Image Recognition](demos/image_classification/keras-cnn-dog-or-cat-classification.ipynb)
- [Image Recognition](demos/image-classification/keras-cnn-dog-or-cat-classification.ipynb)
- [Natural Language Processing (NLP)](demos/nlp/nlp-example.ipynb)
- [Streaming Enrichment](demos/streaming-enrichment/Streaming-enrichment.ipynb)
- [Stream Enrichment](demos/stream-enrich/stream-enrich.ipynb)
- [Jupyter Notebook Basics](#jupyter-notebook-basics)
- [Creating Virtual Environments in Jupyter Notebook](#creating-virtual-environments-in-jupyter-notebook)
- [Additional Resources](#additional-resources)
Expand Down Expand Up @@ -53,7 +53,7 @@ For a more in-depth introduction to the platform, see the following resources:

A good place to start your development is with the platform [tutorial Jupyter notebooks](https://github.com/v3io/tutorials).

- The [**GettingStarted**](GettingStarted/collect-n-explore.ipynb) directory contains information and code examples to help you quickly get started using the platform.
- The [**getting-started**](getting-started/collect-n-explore.ipynb) directory contains information and code examples to help you quickly get started using the platform.
- The [**demos**](demos/README.ipynb) directory contains full end-to-end use-case application demos.

<a id="data-science-workflow"></a>
Expand All @@ -75,30 +75,30 @@ There are many ways to collect and ingest data from various sources into the pla

- Streaming data in real time from sources such as Kafka, Kinesis, Azure Event Hubs, or Google Pub/Sub.
- Loading data directly from external databases using an event-driven or periodic/scheduled implementation.
See the explanation and examples in the [**ReadingFromExternalDB**](GettingStarted/ReadingFromExternalDB.ipynb) tutorial.
See the explanation and examples in the [**read-external-db**](getting-started/read-external-db.ipynb#ingest-from-external-db-to-no-sql-using-frames) tutorial.
- Loading files (objects), in any format (for example, CSV, Parquet, JSON, or a binary image), from internal or external sources such as Amazon S3 or Hadoop.
See, for example, the [**FilesAccess**](GettingStarted/FilesAccess.ipynb) tutorial.
See, for example, the [**file-access**](getting-started/file-access.ipynb) tutorial.
- Importing time-series telemetry data using a Prometheus compatible scraping API.
- Ingesting (writing) data directly into the system using RESTful AWS-like simple-object, streaming, or NoSQL APIs.
See the platform's [Web-API References](https://www.iguazio.com/docs/reference/latest-release/api-reference/web-apis/).
- Scraping or reading data from external sources &mdash; such as Twitter, weather services, or stock-trading data services &mdash; using serverless functions.
See, for example, the [**stocks**](demos/stocks/read_stocks.ipynb) demo use-case application.
See, for example, the [**stocks**](demos/stocks/read-stocks.ipynb) demo use-case application.

For more information and examples of data collection and ingestion wcollect-n-exploreith the platform, see the [**collect-n-explore**](GettingStarted/collect-n-explore.ipynb#gs-data-collection-and-ingestion) tutorial Jupyter notebook.
For more information and examples of data collection and ingestion wcollect-n-exploreith the platform, see the [**collect-n-explore**](getting-started/collect-n-explore.ipynb#gs-data-collection-and-ingestion) tutorial Jupyter notebook.

<a id="data-exploration-and-processing"></a>
### Exploring and Processing Data

The platform includes a wide range of integrated open-source data query and exploration tools, including the following:

- [Apache Spark](https://spark.apache.org/) data-processing engine &mdash; including the Spark SQL and Datasets, MLlib, R, and GraphX libraries &mdash; with real-time access to the platform's NoSQL data store and file system.
See the platform's [Spark APIs reference](https://www.iguazio.com/docs/reference/latest-release/api-reference/spark-apis/) and the examples in the [**SparkSQLAnalytics**](GettingStarted/SparkSQLAnalytics.ipynb) tutorial.
See the platform's [Spark APIs reference](https://www.iguazio.com/docs/reference/latest-release/api-reference/spark-apis/) and the examples in the [**spark-sql-analytics**](getting-started/spark-sql-analytics.ipynb) tutorial.
- [Presto](http://prestodb.github.io/) distributed SQL query engine, which can be used to run interactive SQL queries over platform NoSQL tables or other object (file) data sources.
See the platform's [Presto reference](https://www.iguazio.com/docs/reference/latest-release/presto/).
- [pandas](https://pandas.pydata.org/) Python analysis library, including structured DataFrames.
- [Dask](https://dask.org/) parallel-computing Python library, including scaled pandas DataFrames.
- [V3IO Frames](https://github.com/v3io/frames) &mdash; Iguazio's open-source data-access library, which provides a unified high-performance API for accessing NoSQL, stream, and time-series data in the platform's data store and features native integration with pandas and [NVIDIA RAPIDS](https://rapids.ai/).
See, for example, the [**frames**](GettingStarted/frames.ipynb) tutorial.
See, for example, the [**frames**](getting-started/frames.ipynb) tutorial.
- Built-in support for ML packages such as [scikit-learn](https://scikit-learn.org), [Pyplot](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.html), [NumPy](http://www.numpy.org/), [PyTorch](https://pytorch.org/), and [TensorFlow](https://www.tensorflow.org/).

All these tools are integrated with the platform's Jupyter Notebook service, allowing users to access the same data from Jupyter through different interfaces with minimal configuration overhead.
Expand All @@ -107,7 +107,7 @@ This design, coupled with the platform's unified data model, enables users to st

> **Note:** You can deploy and manage application services, such as Spark and Jupyter Notebook, from the **Services** page of the platform dashboard.
For more information and examples of data exploration with the platform, see the [**collect-n-explore**](GettingStarted/collect-n-explore.ipynb#gs-data-exploration-and-processing) tutorial Jupyter notebook.
For more information and examples of data exploration with the platform, see the [**collect-n-explore**](getting-started/collect-n-explore.ipynb#gs-data-exploration-and-processing) tutorial Jupyter notebook.

<a id="building-and-training-models"></a>
### Building and Training Models
Expand All @@ -117,7 +117,7 @@ When your model is ready, you can train it in Jupyter Notebook or by using scala
You can find model-training examples in the platform's tutorial Jupyter notebooks:

- The [NetOps demo](demos/netops/training.ipynb) tutorial demonstrates predictive infrastructure-monitoring using scikit-learn.
- The [image-classification demo](demos/image_classification/infer.ipynb) tutorial demonstrates image recognition using TensorFlow and Keras.
- The [image-classification demo](demos/image-classification/infer.ipynb) tutorial demonstrates image recognition using TensorFlow and Keras.

If you're are a beginner, you might find the following ML guide useful &mdash; [Machine Learning Algorithms In Layman's Terms](https://towardsdatascience.com/machine-learning-algorithms-in-laymans-terms-part-1-d0368d769a7b).

Expand All @@ -136,7 +136,7 @@ For detailed information about Nuclio, visit the [Nuclio web site](https://nucli
> **Note:** Nuclio functions aren't limited to model serving: they can automate data collection, serve custom APIs, build real-time feature vectors, drive triggers, and more.
For an overview of Nuclio and how to develop, document, and deploy serverless Python Nuclio functions from Jupyter Notebook, see the [nuclio-jupyter documentation](https://github.com/nuclio/nuclio-jupyter/blob/master/README.md).
You can also find examples in the platform tutorial Jupyter notebooks; for example, the [NetOps demo](demos/netops/nuclio_infer.ipynb) tutorial demonstrates how to deploy a network-operations model as a function.
You can also find examples in the platform tutorial Jupyter notebooks; for example, the [NetOps demo](demos/netops/infer.ipynb) tutorial demonstrates how to deploy a network-operations model as a function.

<a id="visualization-monitoring-and-logging"></a>
### Visualization, Monitoring, and Logging
Expand All @@ -158,11 +158,11 @@ For information on how to create Grafana dashboards to monitor and visualize dat
Iguazio provides full end-to-end use-case applications that demonstrate how to use the Iguazio Data Science Platform and related tools to address data science requirements for different industries and implementations.
The applications are provided in the **demos** directory of the platform's tutorial Jupyter notebooks and cover the following use cases; for more detailed descriptions, see the demos README ([notebook](demos/README.ipynb) / [Markdown](demos/README.md)):

- <a id="stocks-use-case-app"></a>**Smart stock trading** ([**stocks**](demos/stocks/read_stocks.ipynb)) &mdash; the application reads stock-exchange data from an internet service into a time-series database (TSDB); uses Twitter to analyze the market sentiment on specific stocks, in real time; and saves the data to a platform NoSQL table that is used for generating reports and analyzing and visualization the data in a Grafana dashboard.
- <a id="stocks-use-case-app"></a>**Smart stock trading** ([**stocks**](demos/stocks/read-stocks.ipynb)) &mdash; the application reads stock-exchange data from an internet service into a time-series database (TSDB); uses Twitter to analyze the market sentiment on specific stocks, in real time; and saves the data to a platform NoSQL table that is used for generating reports and analyzing and visualization the data in a Grafana dashboard.
- <a id="netops-use-case-app"></a>**Predictive infrastructure monitoring** ([**netops**](demos/netops/generator.ipynb)) &mdash; the application builds, trains, and deploys a machine-learning model for analyzing and predicting failure in network devices as part of a network operations (NetOps) flow. The goal is to identify anomalies for device metrics &mdash; such as CPU, memory consumption, or temperature &mdash; which can signify an upcoming issue or failure.
- <a id="image-recog-use-case-app"></a>**Image recognition** ([**image_classification**](demos/image_classification/keras-cnn-dog-or-cat-classification.ipynb)) &mdash; the application builds and trains an ML model that identifies (recognizes) and classifies images by using Keras, TensorFlow, and scikit-learn.
- <a id="image-recog-use-case-app"></a>**Image recognition** ([**image-classification**](demos/image-classification/keras-cnn-dog-or-cat-classification.ipynb)) &mdash; the application builds and trains an ML model that identifies (recognizes) and classifies images by using Keras, TensorFlow, and scikit-learn.
- <a id="nlp-use-case-app"></a>**Natural language processing (NLP)** ([**nlp**](demos/nlp/nlp-example.ipynb)) &mdash; the application processes natural-language textual data &mdash; including spelling correction and sentiment analysis &mdash; and generates a Nuclio serverless function that translates any given text string to another (configurable) language.
- <a id="streaming-enrichment-use-case-app"></a>**Streaming enrichment** ([**streaming-enrichment**](demos/streaming-enrichment/Streaming-enrichment.ipynb)) &mdash; the application demonstrates a typical stream-based data-engineering pipeline, which is required in many real-world scenarios: data is streamed from an event streaming engine; the data is enriched, in real time, using data from a NoSQL table; the enriched data is saved to an output data stream and then consumed from this stream.
- <a id="stream-enrich-use-case-app"></a>**Stream enrichment** ([**stream-enrich**](demos/stream-enrich/stream-enrich.ipynb)) &mdash; the application demonstrates a typical stream-based data-engineering pipeline, which is required in many real-world scenarios: data is streamed from an event streaming engine; the data is enriched, in real time, using data from a NoSQL table; the enriched data is saved to an output data stream and then consumed from this stream.

<a id="jupyter-notebook-basics"></a>
## Jupyter Notebook Basics
Expand All @@ -183,18 +183,18 @@ The root file-browser directory of the platform's Jupyter Notebook service conta
- The contents of the running-user home directory &mdash; **users/&lt;running user&gt;**.
This directory contains the platform's [tutorial Jupyter notebooks](https://github.com/v3io/tutorials):

- [**Welcome.ipynb**](../Welcome.ipynb) &mdash; a documentation notebook that provides a short introduction to the platform and how to use it to implement a full data science workflow.
- **GettingStarted** &mdash; a directory containing getting-started tutorials that explain and demonstrate how to perform basic platform operations &mdash; such as data collection, ingestion, and analysis &mdash; as detailed in the current notebook.
- **demos** &mdash; a directory containing [end-to-end application use-case demos](../demos/README.ipynb).
- [**welcome.ipynb**](../welcome.ipynb) / [**README.md**](../README.md) &mdash; the current document, which provides a short introduction to the platform and how to use it to implement a full data science workflow.
- **getting-started** &mdash; a directory containing getting-started tutorials that explain and demonstrate how to perform different platform operations using the platform APIs and integrated tools.
- **demos** &mdash; a directory containing [end-to-end application use-case demos](#end-to-end-use-case-applications).

For information about the predefined data containers and how to reference data in these containers, see [Platform Data Containers](GettingStarted/collect-n-explore.ipynb/#platform-data-containers) in the **collect-n-explore** tutorial notebook.
For information about the predefined data containers and how to reference data in these containers, see [Platform Data Containers](getting-started/collect-n-explore.ipynb/#platform-data-containers) in the **collect-n-explore** tutorial notebook.

<a id="creating-virtual-environments-in-jupyter-notebook"></a>
### Creating Virtual Environments in Jupyter Notebook

A virtual environment is a named, isolated, working copy of Python that maintains its own files, directories, and paths so that you can work with specific versions of libraries or Python itself without affecting other Python projects.
Virtual environments make it easy to cleanly separate projects and avoid problems with different dependencies and version requirements across components.
See the [CondaVirtualEnv](GettingStarted/CondaVirtualEnv.ipynb) tutorial notebook for step-by-step instructions for using conda to create your own Python virtual environments, which will appear as custom kernels in Jupyter Notebook.
See the [virutal-env](getting-started/virutal-env.ipynb) tutorial notebook for step-by-step instructions for using conda to create your own Python virtual environments, which will appear as custom kernels in Jupyter Notebook.

<a id="additional-resources"></a>
## Additional Resources
Expand Down
12 changes: 6 additions & 6 deletions demos/README.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
"- [Predictive Infrastructure Monitoring](#netops-demo)\n",
"- [Image Recognition](#image-classification-demo)\n",
"- [Natural Language Processing (NLP)](#nlp-demo)\n",
"- [Streaming Enrichment](#streaming-enrichment-demo)"
"- [Stream Enrichment](#stream-enrich-demo)"
]
},
{
Expand All @@ -38,7 +38,7 @@
"<a id=\"stocks-demo\"></a>\n",
"## Smart Stock Trading\n",
"\n",
"The [**stocks**](stocks/read_stocks.ipynb) demo demonstrates a smart stock-trading application: \n",
"The [**stocks**](stocks/read-stocks.ipynb) demo demonstrates a smart stock-trading application: \n",
"the application reads stock-exchange data from an internet service into a time-series database (TSDB); uses Twitter to analyze the market sentiment on specific stocks, in real time; and saves the data to a platform NoSQL table that is used for generating reports and analyzing and visualization the data in a Grafana dashboard.\n",
"\n",
"- The stock data is read from Twitter by using the [TwythonStreamer](https://twython.readthedocs.io/en/latest/usage/streaming_api.html) Python wrapper to the Twitter Streaming API, and saved to TSDB and NoSQL tables in the platform.\n",
Expand Down Expand Up @@ -70,7 +70,7 @@
"<a id=\"image-classification-demo\"></a>\n",
"## Image Recognition\n",
"\n",
"The [**image_classification**](image_classification/keras-cnn-dog-or-cat-classification.ipynb) demo demonstrates image recognition: the application builds and trains an ML model that identifies (recognizes) and classifies images.\n",
"The [**image-classification**](image-classification/keras-cnn-dog-or-cat-classification.ipynb) demo demonstrates image recognition: the application builds and trains an ML model that identifies (recognizes) and classifies images.\n",
"\n",
"- The data is collected by downloading images of dogs and cats from the Iguazio sample data-set AWS bucket.\n",
"- The training data for the ML model is prepared by using [pandas](https://pandas.pydata.org/) DataFrames to build a predecition map.\n",
Expand All @@ -95,10 +95,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"streaming-enrichment-demo\"></a>\n",
"### Streaming Enrichment\n",
"<a id=\"stream-enrich-demo\"></a>\n",
"### Stream Enrichment\n",
"\n",
"The [**streaming-enrichment**](streaming-enrichment/Streaming-enrichment.ipynb) demo demonstrates a typical stream-based data-engineering pipeline, which is required in many real-world scenarios: data is streamed from an event streaming engine; the data is enriched, in real time, using data from a NoSQL table; the enriched data is saved to an output data stream and then consumed from this stream.\n",
"The [**stream-enrich**](stream-enrich/stream-enrich.ipynb) demo demonstrates a typical stream-based data-engineering pipeline, which is required in many real-world scenarios: data is streamed from an event streaming engine; the data is enriched, in real time, using data from a NoSQL table; the enriched data is saved to an output data stream and then consumed from this stream.\n",
"\n",
"- Car-owner data is streamed into the platform from a simulated streaming engine by using an event-triggered [Nuclio](https://nuclio.io/) serverless function.\n",
"- The data is written (ingested) into an input platform stream by using the the platform's [Streaming Web API](https://www.iguazio.com/docs/reference/latest-release/api-reference/web-apis/streaming-web-api/).\n",
Expand Down
Loading

0 comments on commit ac0834f

Please sign in to comment.