From 61c1f0b399c23fea3acc6dbcf135144f179e7a80 Mon Sep 17 00:00:00 2001 From: Francisco Romero Bueno Date: Thu, 30 Mar 2017 11:05:56 +0200 Subject: [PATCH] [1421][cygnus][doc] Fix titles according to Github rendering --- README.md | 12 +-- cygnus-common/README.md | 24 +++--- cygnus-ngsi/README.md | 26 +++--- cygnus-ngsi/conf/README.md | 4 +- cygnus-ngsi/test/e2e/test_environment.md | 4 +- cygnus-twitter/README.md | 18 ++-- doc/architecture.md | 10 +-- doc/contributing/contributing_guidelines.md | 46 +++++----- .../backends_catalogue/README.md | 2 +- .../backends_catalogue/cartodb_backend.md | 4 +- .../backends_catalogue/ckan_backend.md | 8 +- .../backends_catalogue/dynamodb_backend.md | 6 +- .../backends_catalogue/hdfs_backend.md | 6 +- .../backends_catalogue/hive_backend.md | 6 +- .../backends_catalogue/http_backend.md | 10 +-- .../backends_catalogue/introduction.md | 6 +- .../backends_catalogue/kafka_backend.md | 6 +- .../backends_catalogue/mongodb_backend.md | 6 +- .../backends_catalogue/mysql_backend.md | 6 +- .../backends_catalogue/orion_backend.md | 6 +- .../backends_catalogue/postgresql_backend.md | 6 +- .../README.md | 2 +- .../cygnus_agent_conf.md | 8 +- .../diagnosis_procedures.md | 18 ++-- .../flume_env_conf.md | 2 +- .../hw_requirements.md | 4 +- .../install_from_sources.md | 22 ++--- .../install_with_docker.md | 22 ++--- .../install_with_rpm.md | 4 +- .../introduction.md | 6 +- .../issues_and_contact.md | 2 +- .../log4j_conf.md | 2 +- .../logs_and_alarms.md | 8 +- .../management_interface.md | 32 +++---- .../management_interface_v1.md | 84 +++++++++---------- .../running_as_process.md | 2 +- .../running_as_service.md | 2 +- .../sanity_checks.md | 10 +-- .../testing.md | 4 +- .../flume_extensions_catalogue/README.md | 2 +- .../introduction.md | 6 +- .../issues_and_contact.md | 2 +- .../ngsi_cartodb_sink.md | 60 ++++++------- .../ngsi_ckan_sink.md | 54 ++++++------ .../ngsi_dynamodb_sink.md | 50 +++++------ .../ngsi_grouping_interceptor.md | 16 ++-- .../ngsi_hdfs_sink.md | 67 ++++++++------- .../ngsi_kafka_sink.md | 36 ++++---- .../ngsi_mongo_sink.md | 50 +++++------ .../ngsi_mysql_sink.md | 52 ++++++------ .../ngsi_name_mappings_interceptor.md | 16 ++-- .../ngsi_postgresql_sink.md | 52 ++++++------ .../ngsi_rest_handler.md | 20 ++--- .../ngsi_sth_sink.md | 46 +++++----- .../ngsi_test_sink.md | 22 ++--- .../round_robin_channel_selector.md | 2 +- .../README.md | 2 +- .../backends_as_sth.md | 8 +- .../deprecated_and_removed.md | 24 +++--- .../diagnosis_procedures.md | 2 +- .../grouping_rules.md | 8 +- .../install_from_sources.md | 14 ++-- .../install_with_docker.md | 22 ++--- .../install_with_rpm.md | 2 +- .../introduction.md | 6 +- .../ipv6_support.md | 10 +-- .../issues_and_contact.md | 2 +- .../multitenancy.md | 12 +-- .../name_mappings.md | 8 +- .../ngsi_agent_conf.md | 2 +- .../ngsiv2_support.md | 2 +- .../performance_tips.md | 28 +++---- .../running_as_process.md | 2 +- .../running_as_service.md | 2 +- .../sanity_checks.md | 2 +- .../testing.md | 6 +- .../integration/orion_cygnus_kafka.md | 20 ++--- .../integration/orion_cygnus_spark.md | 16 ++-- doc/cygnus-ngsi/quick_start_guide.md | 8 +- .../user_and_programmer_guide/README.md | 4 +- .../adding_new_sink.md | 26 +++--- .../connecting_orion.md | 4 +- .../user_and_programmer_guide/introduction.md | 8 +- .../issues_and_contact.md | 2 +- .../introduction.md | 8 +- .../issues_and_contact.md | 2 +- .../twitter_hdfs_sink.md | 34 ++++---- .../twitter_source.md | 12 +-- .../README.md | 2 +- .../configuration.md | 6 +- .../install_from_sources.md | 14 ++-- .../install_with_docker.md | 20 ++--- .../introduction.md | 6 +- .../issues_and_contact.md | 2 +- .../logs_and_alarms.md | 8 +- .../running.md | 4 +- .../testing.md | 4 +- doc/cygnus-twitter/quick_start_guide.md | 4 +- doc/index.md | 12 +-- docker/cygnus-common/README.md | 2 +- docker/cygnus-ngsi/README.md | 2 +- docker/cygnus-twitter/README.md | 2 +- reporting_issues_and_contact.md | 4 +- 103 files changed, 704 insertions(+), 705 deletions(-) diff --git a/README.md b/README.md index 812cce82f..2715df421 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -#Cygnus +# Cygnus [![License badge](https://img.shields.io/badge/license-AGPL-blue.svg)](https://opensource.org/licenses/AGPL-3.0) [![Documentation badge](https://readthedocs.org/projects/fiware-cygnus/badge/?version=latest)](http://fiware-cygnus.readthedocs.org/en/latest/?badge=latest) [![Support badge]( https://img.shields.io/badge/support-sof-yellowgreen.svg)](http://stackoverflow.com/questions/tagged/fiware-cygnus) @@ -8,7 +8,7 @@ [![Docker badge](https://img.shields.io/docker/pulls/fiware/cygnus-ngsi.svg)](https://hub.docker.com/r/fiware/cygnus-ngsi/) [![Docker badge](https://img.shields.io/docker/pulls/fiware/cygnus-twitter.svg)](https://hub.docker.com/r/fiware/cygnus-twitter/) -##Welcome +## Welcome This project is part of [FIWARE](http://fiware.org), being part of the [Cosmos](http://catalogue.fiware.org/enablers/bigdata-analysis-cosmos) Ecosystem. Cygnus is a connector in charge of persisting certain sources of data in certain configured third-party storages, creating a historical view of such data. @@ -34,12 +34,12 @@ Current stable release is able to persist the following sources of data in the f **IMPORTANT NOTE**: for the time being, cygnus-ngsi and cygus-twitter agents cannot be installed in the same base path, because of an incompatibility with the required version of the `httpclient` library. Of course, if you are going to use just one of the agents, there is no problem at all. -##Cyngus place in FIWARE architecture +## Cyngus place in FIWARE architecture Cygnus (more specifically, cygnus-ngsi agent) plays the role of a connector between Orion Context Broker (which is a NGSI source of data) and many FIWARE storages such as CKAN, Cosmos Big Data (Hadoop) and STH Comet. Of course, as previously said, you may add MySQL, Kafka, Carto, etc as other non FIWARE storages to the FIWARE architecture. ![FIWARE architecture](doc/images/fiware_architecture.png) -##Further documentation +## Further documentation The per agent **Quick Start Guide** found at readthedocs.org provides a good documentation summary ([cygnus-ngsi](http://fiware-cygnus.readthedocs.io/en/latest/cygnus-ngsi/quick_start_guide/index.html), [cygnus-twitter](http://fiware-cygnus.readthedocs.io/en/latest/cygnus-twitter/quick_start_guide/index.html)). Nevertheless, both the **Installation and Administration Guide** and the **User and Programmer Guide** for each agent also found at [readthedocs.org](http://fiware-cygnus.readthedocs.io/en/latest/) cover more advanced topics. @@ -53,8 +53,8 @@ Other interesting links are: * [cygnus-ngsi](https://edu.fiware.org/mod/resource/view.php?id=1037) **introductory course** in FIWARE Academy. * The [Contributing Guidelines](doc/contributing/contributing_guidelines.md) if your aim is to extend Cygnus. -##Licensing +## Licensing Cygnus is licensed under Affero General Public License (GPL) version 3. You can find a [copy of this license in the repository](./LICENSE). -##Reporting issues and contact information +## Reporting issues and contact information Any doubt you may have, please refer to the [Cygnus Core Team](./reporting_issues_and_contact.md). diff --git a/cygnus-common/README.md b/cygnus-common/README.md index 33863bda1..8cb838ac0 100644 --- a/cygnus-common/README.md +++ b/cygnus-common/README.md @@ -1,4 +1,4 @@ -#cygnus-common +# cygnus-common Content: * [Welcome to cygnus-common](#section1) @@ -13,21 +13,21 @@ Content: * [Features summary](#section4) * [Reporting issues and contact information](#section5) -#Welcome to cygnus-common +# Welcome to cygnus-common cygnus-common is the base for any Cygnus agent (e.g. cygnus-ngsi). Cygnus agents are based on [Apache Flume](http://flume.apache.org/) agents, which are basically composed of a source in charge of receiving the data, a channel where the source puts the data once it has been transformed into a Flume event, and a sink, which takes Flume events from the channel in order to persist the data within its body into a third-party storage. cygnus-common provides a set of extensions for Apache Flume, for instance, defining how a Http source handler must look like or adding channels suitable for reading Cygnus-like counters. But not only Flume extensions, but interesting functionality for any agent in terms of a common Management Interface, common backend classes for HDFS, MySQL, MongoDB, PostgreSQL and many others, unified logging classes and error handling, etc. [Top](#top) -##Basic operation -###Hardware requirements +## Basic operation +### Hardware requirements * RAM: 1 GB, specially if abusing of the batching mechanism. * HDD: A few GB may be enough unless the channel types are configured as `FileChannel` type. [Top](#top) -###Installation (CentOS/RedHat) +### Installation (CentOS/RedHat) Simply configure the FIWARE repository if not yet configured: $ cat > /etc/yum.repos.d/fiware.repo <Configuration +### Configuration Configuring cygnus-common is just configuring Apache Flume since no agent-related functionality is added (that's something agents as cygnus-ngsi do). Please, check [this](https://flume.apache.org/FlumeUserGuide.html#setup) official guidelines. [Top](#top) -###Running +### Running cygnus-common can be run as a service by simply typing: $ service cygnus-common start @@ -60,7 +60,7 @@ Logs are written in `/var/log/cygnus/cygnus.log`, and the PID of the process wil [Top](#top) -###Unit testing +### Unit testing Running the tests require [Apache Maven](https://maven.apache.org/) installed and cygnus-common sources downloaded. $ git clone https://github.com/telefonicaid/fiware-cygnus.git @@ -69,7 +69,7 @@ Running the tests require [Apache Maven](https://maven.apache.org/) installed an [Top](#top) -###Management API overview +### Management API overview Run the following `curl` in order to get the version (assuming cygnus-common runs on `localhost`): ``` @@ -84,13 +84,13 @@ Many other operations, like getting/putting/updating/deleting the grouping rules [Top](#top) -##Further reading +## Further reading Further information can be found in the documentation at [fiware-cygnus.readthedocs.io](https://fiware-cygnus.readthedocs.io) [Top](#top) -##Features summary +## Features summary @@ -112,7 +112,7 @@ Further information can be found in the documentation at [fiware-cygnus.readthed [Top](#top) -##Reporting issues and contact information +## Reporting issues and contact information Any doubt you may have, please refer to the [Cygnus Core Team](../reporting_issues_and_contact.md). [Top](#top) diff --git a/cygnus-ngsi/README.md b/cygnus-ngsi/README.md index a4b4383d1..fdb01af96 100644 --- a/cygnus-ngsi/README.md +++ b/cygnus-ngsi/README.md @@ -1,4 +1,4 @@ -#Cygnus NGSI +# Cygnus NGSI Content: * [Welcome to Cygnus NGSI](#section1) @@ -14,7 +14,7 @@ Content: * [Features summary](#section4) * [Reporting issues and contact information](#section5) -##Welcome to Cygnus NGSI +## Welcome to Cygnus NGSI Cygnus NGSI is a connector in charge of persisting [Orion](https://github.com/telefonicaid/fiware-orion) context data in certain configured third-party storages, creating a historical view of such data. In other words, Orion only stores the last value regarding an entity's attribute, and if an older value is required then you will have to persist it in other storage, value by value, using Cygnus NGSI. Cygnus NGSI uses the subscription/notification feature of Orion. A subscription is made in Orion on behalf of Cygnus NGSI, detailing which entities we want to be notified when an update occurs on any of those entities attributes. @@ -37,14 +37,14 @@ You may consider to visit [Cygnus NGSI Quick Start Guide](../doc/cygnus-ngsi/qui [Top](#top) -##Basic operation -###Hardware requirements +## Basic operation +### Hardware requirements * RAM: 1 GB, specially if abusing of the batching mechanism. * HDD: A few GB may be enough unless the channel types are configured as `FileChannel` type. [Top](#top) -###Installation (CentOS/RedHat) +### Installation (CentOS/RedHat) Simply configure the FIWARE repository if not yet configured: $ cat > /etc/yum.repos.d/fiware.repo <Configuration +### Configuration Cygnus NGSI is a tool with a high degree of configuration required for properly running it. The reason is the configuration describes the Flume-based agent chosen to be run. So, the starting point is choosing the internal architecture of the Cygnus NGSI agent. Let's assume the simplest one: @@ -121,7 +121,7 @@ POLLING_INTERVAL=30 [Top](#top) -###Running +### Running Cygnus NGSI can be run as a service by simply typing: $ (sudo) service cygnus start @@ -130,7 +130,7 @@ Logs are written in `/var/log/cygnus/cygnus.log`, and the PID of the process wil [Top](#top) -###Unit testing +### Unit testing Running the tests require [Apache Maven](https://maven.apache.org/) installed and Cygnus NGSI sources downloaded. $ git clone https://github.com/telefonicaid/fiware-cygnus.git @@ -139,7 +139,7 @@ Running the tests require [Apache Maven](https://maven.apache.org/) installed an [Top](#top) -###e2e testing +### e2e testing Cygnus NGSI works by receiving NGSI-like notifications, which are finally persisted. In order to test this, you can run any of the notification scripts located in the [resources folder](./resources/ngsi-examples) of this repo, which emulate certain notification types. ``` @@ -167,7 +167,7 @@ Or you can connect a real NGSI source such as [Orion Context Broker](https://git [Top](#top) -###Management API overview +### Management API overview Run the following `curl` in order to get the version (assuming Cygnus NGSI runs on `localhost`): ``` @@ -226,7 +226,7 @@ Many other operations, like getting/putting/updating/deleting the grouping rules [Top](#top) -##Advanced topics and further reading +## Advanced topics and further reading Detailed information regarding cygus-ngsi can be found in the [Installation and Administration Guide](../doc/cygnus-ngsi/installation_and_administration_guide/introduction.md), the [User and Programmer Guide](../doc/cygnus-ngsi/user_and_programmer_guide/introduction.md) and the [Flume extensions catalogue](../doc/cygnus-ngsi/flume_extensions_catalogue/introduction.md). The following is just a list of shortcuts regarding the most popular topics: * [Installation with docker](../doc/cygnus-ngsi/installation_and_administration_guide/install_with_docker). An alternative to RPM installation, docker is one of the main options when installing FIWARE components. @@ -241,7 +241,7 @@ Detailed information regarding cygus-ngsi can be found in the [Installation and [Top](#top) -##Features summary +## Features summary
Management InterfaceGET /version0.5.0
GET /stats0.13.0
@@ -303,7 +303,7 @@ Detailed information regarding cygus-ngsi can be found in the [Installation and [Top](#top) -##Reporting issues and contact information +## Reporting issues and contact information Any doubt you may have, please refer to the [Cygnus Core Team](../doc/cygnus-ngsi/user_and_programmer_guide/issues_and_contact.md). [Top](#top) diff --git a/cygnus-ngsi/conf/README.md b/cygnus-ngsi/conf/README.md index e78ce58a0..f43c63c76 100644 --- a/cygnus-ngsi/conf/README.md +++ b/cygnus-ngsi/conf/README.md @@ -1,4 +1,4 @@ -#`cygnus-ngsi` configuration notes +# `cygnus-ngsi` configuration notes * The `agent_ngsi.conf.template` file is meant for substituting the `agent.conf.template` file installed by `cygnus-common`: ``` @@ -9,4 +9,4 @@ $ cp agent_ngsi.conf.template [APACHE_FLUME_HOME]/conf/ ``` $ cp grouping_rules.conf.template [APACHE_FLUME_HOME]/conf/ -`` \ No newline at end of file +`` diff --git a/cygnus-ngsi/test/e2e/test_environment.md b/cygnus-ngsi/test/e2e/test_environment.md index 9a15f60bc..fe85369f2 100644 --- a/cygnus-ngsi/test/e2e/test_environment.md +++ b/cygnus-ngsi/test/e2e/test_environment.md @@ -1,10 +1,10 @@ -##Introduction +## Introduction This is a private environment for e2e testing addressed to fiware-cygnus developers. All the information about it is described in a document in Google Docs where we will update all the changes. The document has restricted access (contact us for more information): * [Information about environment](https://docs.google.com/document/d/1XmoIn6TViEfss2PuBfVZ7SUB-57_S6EAsI4DB79yagA/edit?usp=sharing) -#Reporting issues and contact information +# Reporting issues and contact information There are several channels suited for reporting issues and asking for doubts in general. Each one depends on the nature of the question: * Use [stackoverflow.com](http://stackoverflow.com) for specific questions about this software. Typically, these will be related to installation problems, errors and bugs. Development questions when forking the code are welcome as well. Use the `fiware-cygnus` tag. diff --git a/cygnus-twitter/README.md b/cygnus-twitter/README.md index 9a12bd7f6..2a4a6e7ff 100644 --- a/cygnus-twitter/README.md +++ b/cygnus-twitter/README.md @@ -10,7 +10,7 @@ * [Licensing](#section6) * [Reporting issues and contact information](#section7) -##Welcome to Cygnus-twitter +## Welcome to Cygnus-twitter This project is part of [FIWARE](http://fiware.org), being part of the [Cosmos](http://catalogue.fiware.org/enablers/bigdata-analysis-cosmos) Ecosystem. Cygnus-twitter is a connector in charge of persisting tweets (https://dev.twitter.com/overview/api/tweets) in certain configured third-party storages, creating a historical view of such data. @@ -23,14 +23,14 @@ Current stable release is able to persist Twitter data in: [Top](#top) -##Basic operation -###Hardware requirements +## Basic operation +### Hardware requirements * RAM: 1 GB, specially if abusing of the batching mechanism. * HDD: A few GB may be enough unless the channel types are configured as `FileChannel` type. [Top](#top) -###Configuration +### Configuration Cygnus-twitter is a tool with a high degree of configuration required for properly running it. So, the starting point is choosing the internal architecture of the Cygnus agent. Let's assume the simplest one: @@ -183,7 +183,7 @@ Check the [User and Programmer Guide](../../doc/cygnus-twitter/user_and_programm [Top](#top) -###Unit testing +### Unit testing Running the tests require [Apache Maven](https://maven.apache.org/) installed and Cygnus sources downloaded. $ git clone https://github.com/telefonicaid/fiware-cygnus.git @@ -192,7 +192,7 @@ Running the tests require [Apache Maven](https://maven.apache.org/) installed an [Top](#top) -###Management API overview +### Management API overview Run the following `curl` in order to get the version (assuming Cygnus runs on `localhost`): ``` @@ -251,7 +251,7 @@ Many other operations, like getting/putting/updating/deleting the grouping rules [Top](#top) -##Features summary +## Features summary
ComponentFeatureFrom version
NGSIHDFSSinkFirst implementation0.1.0
@@ -267,12 +267,12 @@ Many other operations, like getting/putting/updating/deleting the grouping rules [Top](#top) -##Licensing +## Licensing Cygnus is licensed under Affero General Public License (GPL) version 3. You can find a [copy of this license in the repository](../../LICENSE). [Top](#top) -##Reporting issues and contact information +## Reporting issues and contact information There are several channels suited for reporting issues and asking for doubts in general. Each one depends on the nature of the question: * Use [stackoverflow.com](http://stackoverflow.com) for specific questions about this software. Typically, these will be related to installation problems, errors and bugs. Development questions when forking the code are welcome as well. Use the `fiware-cygnus` tag. diff --git a/doc/architecture.md b/doc/architecture.md index 50f6d9a3e..a9e1b75d9 100644 --- a/doc/architecture.md +++ b/doc/architecture.md @@ -1,7 +1,7 @@ -#Cygnus architecture +# Cygnus architecture Cygnus runs Flume agents. Thus, Cygnus agents architecture is Flume agents one. Let's see how this architecture ranges from the most basic configuration to the most complex one. -##Flume architecture +## Flume architecture As stated in [flume.apache.org](http://flume.apache.org/FlumeDeveloperGuide.html): >An Event is a unit of data that flows through a Flume agent. The Event flows from Source to Channel to Sink, and is represented by an implementation of the Event interface. An Event carries a payload (byte array) that is accompanied by an optional set of headers (string attributes). A Flume agent is a process (JVM) that hosts the components that allow Events to flow from an external source to a external destination. @@ -12,7 +12,7 @@ As stated in [flume.apache.org](http://flume.apache.org/FlumeDeveloperGuide.html [Top](#top) -##Basic Cygnus agent architecture +## Basic Cygnus agent architecture The simplest way of using Cygnus is to adopt basic constructs/flows of source - channel - sink as described in the Apache Flume documentation. There can be as many basic constructs/flows as persistence elements, i.e. one for HDFS, another one for MySQL, etc. For each one of this flows, a [`HttpSource`](http://flume.apache.org/FlumeUserGuide.html#http-source) has to be used. The way this native sources process the Orion notifications is by means of a specific REST handler: `NGSIRESTHandler`. Nevetheless, this basic approach requires each source receives its own event notifications. This is not a problem if the architect clearly defines which flows must end in a HDFS storage, or in a Carto storage, if talking about a NGSI agent. But, what happens if the same event must be stored at HDFS and Carto at the same time? In this case, the constructs are modified in order all of them have the same Http source; then, the notified event is replicated for each channel connected to the source. @@ -38,7 +38,7 @@ Finally, the sinks are custom ones, one per each persistence element covered by [Top](#top) -##Advanced Cygnus architectures +## Advanced Cygnus architectures All the advanced archictures arise when trying to improve the performance of Cygnus. As seen above, basic Cygnus configuration is about a source writting Flume events into a single channel where a single sink consumes those events. This can be clearly moved to a multiple sink configuration running in parallel; there are several possibilities: ### Multiple sinks, single channel @@ -63,7 +63,7 @@ Due to the available Channel Selectors do not fit our needs, a custom sel [Top](#top) -##High availability Cygnus architecture +## High availability Cygnus architecture High Availability (or HA) is achieved by replicating a whole Cygnus agent, independently of the internal architecture (basic or advance), in an active-passive standard schema. I.e. when the active Cygnus agent fails, a load balancer redirects all the incoming Orion notifications to the passive one. Both Cygnus agents are able to persist the notified context data using the same set of sinks with identical configuration. ![](./images/ha_architecture.jpg) diff --git a/doc/contributing/contributing_guidelines.md b/doc/contributing/contributing_guidelines.md index ddf90196f..c5c0241d3 100644 --- a/doc/contributing/contributing_guidelines.md +++ b/doc/contributing/contributing_guidelines.md @@ -1,4 +1,4 @@ -#Contributing guidelines +# Contributing guidelines Content: * [Introduction](#section1) @@ -24,7 +24,7 @@ Content: * [Section in the documentation](#section6.2) * [Configuration files](#section7) -##Introduction +## Introduction This document is intended to developers aiming at contributing a complete Cygnus agent to the Cygnus suite. In order to accept those contributions a [contribution policy](./ContributionPolicy.txt) document has to be signed beforehand. Within this document developers will find detailed guidelines regarding how to contribute to the main Cygnus repository. @@ -33,7 +33,7 @@ Any doubt you may have, please refer to [here](https://github.com/telefonicaid/f [Top](#top) -##Adopted conventions +## Adopted conventions 1. This document uses the following guidelines with regard to the usage of MUST, SHOULD and MAY (and NOT) keywords: * MUST Guidelines. They are mandatory and your agent must conform to that. * SHOULD Guidelines. They are not mandatory but highly recommended if you want to have a mature development process. @@ -45,13 +45,13 @@ Any doubt you may have, please refer to [here](https://github.com/telefonicaid/f [Top](#top) -##Contributing to the repository -###Language of the main repository +## Contributing to the repository +### Language of the main repository The main repository language MUST be English. [Top](#top) -###Repository organization +### Repository organization Each agent MUST have a dedicated folder. Each folder MUST be prefixed with `cygnus-`. For instance: * `cygnus-ngsi` @@ -89,7 +89,7 @@ As can be seen, despite the repository organization, from a Java perspective all [Top](#top) -###Backlog +### Backlog The issues section of the main repository MUST be used for tracking all the features, hardening, bugs and task to be implemented by every agent. The name of each issue MUST follow the following format: @@ -112,14 +112,14 @@ There MUST NOT be assignee because each issue is considered to be assigned to a [Top](#top) -###Main repository versus forked repositories +### Main repository versus forked repositories Every team in charge of an agent MUST create one or more forks of the main repository. Every team SHOULD synchronize their forked repositories with the main one after opening a pull request (see next section). Only those contributions merged into the main repository MUST be considered as part of the official Cygnus development. [Top](#top) -###Pull requests +### Pull requests Any contribution MUST be done through a new opened pull request (PR). These PRs MUST compare certain branch at any forked repository against the `develop` base branch in the main repository. The review process made by the Cygnus Core Team MUST check that the content of the PR is aligned with guidelines. In addition, as any other contribution, a code-wise review MAY be performed by the Cygnus Core Team or any other member of the Community. @@ -130,7 +130,7 @@ Internally to every team, private code reviews SHOULD be done before pull reques [Top](#top) -###Contribution contents +### Contribution contents Every contribution/PR MUST include: * The code implementing the feature/hardening/bug/task. @@ -149,12 +149,12 @@ Where short description MAY enclose other “[...]” sublevels. For inst [Top](#top) -###Coding style +### Coding style The `fiware-cygnus/telefonica_checkstyle.xml` file MUST be configured in any Integrated Development Environment (IDE) used by the different development teams as a coding style checker. This XML file contains all the coding style rules accepted by Telefónica. [Top](#top) -###Commits and squashing +### Commits and squashing Commits within PRs MUST include a comment following this format: [][] @@ -169,7 +169,7 @@ With regards to the [squashing policy](https://help.github.com/articles/about-pu [Top](#top) -###Releasing +### Releasing When generating a new version of Cygnus from the main repository, all the agents MUST be released at the same time as a whole. A minor version (0.X.0, at the moment of writing 0.13.0) of Cygnus MUST be released at the end of each sprint/milestone. A sprint SHOULD comprise a natural month, however sometimes the sprints MAY comprise a different period, for instance a month and a half or half a month (usually, in order to adapt to holydays time). Every sprint MUST be scheduled in advance by Cygnus Core Team in the form of deadline in the related milestone. Agent teams SHOULD use this information in order to, internally, schedule the sprint in terms of issues to be implemented. @@ -190,8 +190,8 @@ Releases MUST be published in the releases section of the main repository As a result of the release, `CHANGES_NEXT_RELEASE` file MUST be emptied in Github repo. -##Deployers -###RPMs +## Deployers +### RPMs There MUST exist a `rpm/` folder at the root of the main repository. A packaging script MUST generate a RPM based on the spec file of each Cygnus agent, including `cygnus-common`. Such a spec file MUST live at the `spec` subfolder within the agent folder. Upon releasing, these RPMs MUST be created and uploaded to some repository in order they are available. As an example, `cygnus-ngsi` agent's RPM is uploaded to `http://repositories.testbed.fiware.org`. @@ -202,7 +202,7 @@ All RPMs spec files (spec for `cygnus-common` and any other agent) MUST contain [Top](#top) -###Dockers +### Dockers There MUST exist a `docker/` folder at the root of the main repository. Every Cygnus agent MUST include a docker subfolder as per the following rules: * `docker/cygnus-common` @@ -216,8 +216,8 @@ Upon releasing, images for the agents MUST be uploaded to `https://hub.docker.co [Top](#top) -##Documentation -###Repository documentation +## Documentation +### Repository documentation There MUST exist a `doc/` folder at the root of the main repository. Every Cygnus agent MUST include a documentation subfolder as per the following rules: * `doc/cygnus-common` @@ -237,7 +237,7 @@ The following elements SHOULD be present as well: [Top](#top) -###`readthedocs.org` documentation +### `readthedocs.org` documentation The documentation within the `doc/` folder MUST be published to `readthedocs.org`. In order to achieve this, a `mkdocs.yml` file MUST live in the root of the main repository acting as a hook. The format of this `mkdocs.yml` file MUST follow this example: @@ -279,8 +279,8 @@ pages: [Top](#top) -##Logs and alarms -###log4j +## Logs and alarms +### log4j log4j is the logging system used by Apache Flume, thus Cygnus agents MUST use log4j. Logs traced by any Cygnus agent MUST contain the following log4 layout: @@ -301,7 +301,7 @@ Field by field: [Top](#top) -###Repository documentation +### Repository documentation The installation and administration guide of any agent (`doc//installation_and_administration_guide/`) MUST contain a section about logs and alarms. Such a section MUST describe the main log message types the agent uses. It is a set of easily identifiable strings or tags in the traces text, and each log traced by the agent MUST be of any of the types among the set. For instance, `cygnus-ngsi` considers the following ones: @@ -325,7 +325,7 @@ In addition, a table MUST be included in charge of defining a set of alarm condi [Top](#top) -##Configuration +## Configuration When adding a new agent to Cygnus, it MUST include an agent configuration template in [Flume format](https://flume.apache.org/FlumeUserGuide.html#setup). Other configuration files MAY be added as well. The specific agent configuration template MUST replace the one handled by `cygnus-common` in the Flume deployment donde by `cygnus-common` RPM. diff --git a/doc/cygnus-common/backends_catalogue/README.md b/doc/cygnus-common/backends_catalogue/README.md index 1c9f2c3d6..00de97c8b 100644 --- a/doc/cygnus-common/backends_catalogue/README.md +++ b/doc/cygnus-common/backends_catalogue/README.md @@ -1,4 +1,4 @@ -#Backends catalogue +# Backends catalogue * [Introduction](./ckan_backend.md) * [DynamoDB backend](./dynamodb_backend.md) diff --git a/doc/cygnus-common/backends_catalogue/cartodb_backend.md b/doc/cygnus-common/backends_catalogue/cartodb_backend.md index 5f4cd3de5..924e3365d 100644 --- a/doc/cygnus-common/backends_catalogue/cartodb_backend.md +++ b/doc/cygnus-common/backends_catalogue/cartodb_backend.md @@ -1,5 +1,5 @@ -#CartoDB backend -##`CartoDBBackend` interface +# CartoDB backend +## `CartoDBBackend` interface This class enumerates the methods any [Carto](https://carto.com/) backend implementation must expose. In this case, the following ones: void insert(String tableName, String fields, String rows) throws Exception; diff --git a/doc/cygnus-common/backends_catalogue/ckan_backend.md b/doc/cygnus-common/backends_catalogue/ckan_backend.md index 654e0c0da..e43c51c9c 100644 --- a/doc/cygnus-common/backends_catalogue/ckan_backend.md +++ b/doc/cygnus-common/backends_catalogue/ckan_backend.md @@ -1,19 +1,19 @@ -#CKAN backend -##`CKANBackend` interface +# CKAN backend +## `CKANBackend` interface This class enumerates the methods any [CKAN](http://ckan.org/) backend implementation must expose. In this case, the following ones: void persist(String orgName, String pkgName, String resName, String records, boolean createEnabled) throws Exception; > Persists the aggregated context data regarding a single entity's attribute (row mode) or a full list of attributes (column mode) within the datastore associated to the given resource. This resource belongs to the given package/dataset, which in the end belongs to the given organization as well. This method creates the parts of the hierarchy (organization, package/dataset, resource and datastore) if any of them is missing. -##`CKANBackendImpl` class +## `CKANBackendImpl` class This is a convenience backend class for CKAN that extends the `HttpBackend` abstract class (which provides common logic for any Http connection-based backend) and implements the `CKANBackend` interface described above. `CKANBackendImpl` really wraps the [CKAN API](http://docs.ckan.org/en/latest/api/). It must be said this backend implementation enforces UTF-8 encoding through the usage of a `Content-Type` http header with a value of `application/json; charset=utf-8`. -##`CKANCache` class +## `CKANCache` class This class is used to improve the performance of `NGSICKANSink` by caching information about the already created organizations, packages/datasets and resources (and datastores). `CKANCache` implements the `HttpBackend` interface since its methods are able to interact directly with CKAN API when some element of the hierarchy is not cached. In detail, this is the workflow when `NGSICKANSink` is combined with `CKANCache`: diff --git a/doc/cygnus-common/backends_catalogue/dynamodb_backend.md b/doc/cygnus-common/backends_catalogue/dynamodb_backend.md index fc1e16764..fef593040 100644 --- a/doc/cygnus-common/backends_catalogue/dynamodb_backend.md +++ b/doc/cygnus-common/backends_catalogue/dynamodb_backend.md @@ -1,5 +1,5 @@ -#DynamoDB backend -##`DynamoDBBackend` interface +# DynamoDB backend +## `DynamoDBBackend` interface This class enumerates the methods any [DynamoDB](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.html) backend implementation must expose. In this case, the following ones: void createTable(String tableName, String primaryKey) throws Exception; @@ -10,7 +10,7 @@ This class enumerates the methods any [DynamoDB](http://docs.aws.amazon.com/amaz > Puts, in the given table, as many items as contained within the given aggregation. -##`DynamoDBBackendImpl` class +## `DynamoDBBackendImpl` class This is a convenience backend class for DynamoDB that implements the `DynamoDBBackend` interface described above. `DynamoDBBackendImpl` really wraps the [DynamoDB API](http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/Welcome.html). diff --git a/doc/cygnus-common/backends_catalogue/hdfs_backend.md b/doc/cygnus-common/backends_catalogue/hdfs_backend.md index 35eea67c2..6af13898c 100644 --- a/doc/cygnus-common/backends_catalogue/hdfs_backend.md +++ b/doc/cygnus-common/backends_catalogue/hdfs_backend.md @@ -1,5 +1,5 @@ -#HDFS backend -##`HDFSBackend` interface +# HDFS backend +## `HDFSBackend` interface This class enumerates the methods any [HDFS](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html) backend implementation must expose. In this case, the following ones: void createDir(String dirPath) throws Exception; @@ -18,7 +18,7 @@ This class enumerates the methods any [HDFS](https://hadoop.apache.org/docs/curr > Checks if a HDFS file, given its path, exists ot not. -##`HDFSBackendImpl` class +## `HDFSBackendImpl` class This is a convenience backend class for HDFS that extends the `HttpBackend` abstract class (provides common logic for any Http connection-based backend) and implements the `HDFSBackend` interface described above. `HDFSBackendImpl` really wraps the [WebHDFS API](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html). diff --git a/doc/cygnus-common/backends_catalogue/hive_backend.md b/doc/cygnus-common/backends_catalogue/hive_backend.md index 3bfd9ca58..29f6a66d8 100644 --- a/doc/cygnus-common/backends_catalogue/hive_backend.md +++ b/doc/cygnus-common/backends_catalogue/hive_backend.md @@ -1,5 +1,5 @@ -#Hive backend -##`HiveBackend` interface +# Hive backend +## `HiveBackend` interface This class enumerates the methods any [Hive](https://hive.apache.org/) backend implementation must expose. In this case, the following ones: boolean doCreateDatabase(String dbName); @@ -14,7 +14,7 @@ This class enumerates the methods any [Hive](https://hive.apache.org/) backend i > Executes a given query. -##`HiveBackendImpl` class +## `HiveBackendImpl` class This is a convenience backend class for Hive that implements the `HiveBackend` interface described above. `HiveBackendImpl` really wraps the Hive JDBC driver ([HiveServer1](https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-JDBC) version and [HiveServer2](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC) version). diff --git a/doc/cygnus-common/backends_catalogue/http_backend.md b/doc/cygnus-common/backends_catalogue/http_backend.md index 14fe84ab8..8cdc56413 100644 --- a/doc/cygnus-common/backends_catalogue/http_backend.md +++ b/doc/cygnus-common/backends_catalogue/http_backend.md @@ -1,12 +1,12 @@ -#Http backend -##`HttpBackend` class +# Http backend +## `HttpBackend` class Coming soon. -##`HttpClientFactory` class +## `HttpClientFactory` class Coming soon. -##`JsonResponse` class +## `JsonResponse` class Coming soon. -##`KerberosCallbackHandler` class +## `KerberosCallbackHandler` class Coming soon. diff --git a/doc/cygnus-common/backends_catalogue/introduction.md b/doc/cygnus-common/backends_catalogue/introduction.md index 325f68f37..edca8ad9f 100644 --- a/doc/cygnus-common/backends_catalogue/introduction.md +++ b/doc/cygnus-common/backends_catalogue/introduction.md @@ -1,14 +1,14 @@ -#Backends catalogue +# Backends catalogue This document details the catalogue of storage backends developed for Cygnus. -#Intended audience +# Intended audience The backends catalogue is a basic piece of documentation for all those software developers interested in creating new sinks for any of the already existent Cygnus agents (even, for new ones); the available catalogue is designed to abstract the most common operations a sink may require regarding a final storage, thus it is ready independentely of the data type handled by the Cygnus agent. FIWARE users may also be interested if wanting to go deeper on the details of the storage process. [Top](#top) -#Structure of the document +# Structure of the document The document simply details each one of the backends within the catalogue, in terms of the interface class that any implementation must follow and the avialable implementations. [Top](#top) diff --git a/doc/cygnus-common/backends_catalogue/kafka_backend.md b/doc/cygnus-common/backends_catalogue/kafka_backend.md index 8226c1059..64a0b5a8d 100644 --- a/doc/cygnus-common/backends_catalogue/kafka_backend.md +++ b/doc/cygnus-common/backends_catalogue/kafka_backend.md @@ -1,5 +1,5 @@ -#Kafka backend -##`KafkaBackend` interface +# Kafka backend +## `KafkaBackend` interface This class enumerates the methods any [Kafka](http://kafka.apache.org/) backend implementation must expose. In this case, the following ones: boolean topicExists(String topic) throws Exception; @@ -14,7 +14,7 @@ This class enumerates the methods any [Kafka](http://kafka.apache.org/) backend > Sends a record to Kafka. A record is composed by a topic name and the data to be send. -##`KafkaBackendImpl` class +## `KafkaBackendImpl` class This is a convenience backend class for CKAN that implements the `KafkaBackend` interface described above. `KafkaBackendImpl` really wraps the [`KafkaProducer`](http://kafka.apache.org/082/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html) and `AdminUtils` Java classes. diff --git a/doc/cygnus-common/backends_catalogue/mongodb_backend.md b/doc/cygnus-common/backends_catalogue/mongodb_backend.md index d37264e49..5569866ea 100644 --- a/doc/cygnus-common/backends_catalogue/mongodb_backend.md +++ b/doc/cygnus-common/backends_catalogue/mongodb_backend.md @@ -1,5 +1,5 @@ -#MongoDB backend -##`MongoBackend` interface +# MongoDB backend +## `MongoBackend` interface This class enumerates the methods any [MongoDB](https://www.mongodb.com/) backend implementation must expose. In this case, the following ones: void createDatabase(String dbName) throws Exception; @@ -22,7 +22,7 @@ This class enumerates the methods any [MongoDB](https://www.mongodb.com/) backen > Stores the hash associated to a collection build based on the givn parameters. -##`MongoBackendImpl` class +## `MongoBackendImpl` class This is a convenience backend class for MongoDB that implements the `MongoBackend` interface described above. `MongoBackendImpl` really wraps the [MongoDB driver for Java](https://docs.mongodb.com/ecosystem/drivers/java/). diff --git a/doc/cygnus-common/backends_catalogue/mysql_backend.md b/doc/cygnus-common/backends_catalogue/mysql_backend.md index 2a7a9b9ba..efc166cb3 100644 --- a/doc/cygnus-common/backends_catalogue/mysql_backend.md +++ b/doc/cygnus-common/backends_catalogue/mysql_backend.md @@ -1,5 +1,5 @@ -#MySQL backend -##`MySQLBackend` interface +# MySQL backend +## `MySQLBackend` interface This class enumerates the methods any [MySQL](https://www.mysql.com/) backend implementation must expose. In this case, the following ones: void createDatabase(String dbName) throws Exception; @@ -14,7 +14,7 @@ This class enumerates the methods any [MySQL](https://www.mysql.com/) backend im > Persists the accumulated context data (in the form of the given field values) regarding an entity within the given table. This table belongs to the given database. The field names are given as well to ensure the right insert of the field values. -##`MySQLBackendImpl` class +## `MySQLBackendImpl` class This is a convenience backend class for MySQL that implements the `MySQLBackend` interface described above. `MySQLBackendImpl` really wraps the [MySQL JDBC driver](https://dev.mysql.com/downloads/connector/j/). diff --git a/doc/cygnus-common/backends_catalogue/orion_backend.md b/doc/cygnus-common/backends_catalogue/orion_backend.md index 6120375a8..5aa493c60 100644 --- a/doc/cygnus-common/backends_catalogue/orion_backend.md +++ b/doc/cygnus-common/backends_catalogue/orion_backend.md @@ -1,5 +1,5 @@ -#Orion backend -##`OrionBackend` interface +# Orion backend +## `OrionBackend` interface This class enumerates the methods any [Orion](https://github.com/telefonicaid/fiware-orion) backend implementation must expose. In this case, the following ones: JsonResponse subscribeContextV1(String cygnusSubscription, String token) throws Exception; @@ -22,7 +22,7 @@ This class enumerates the methods any [Orion](https://github.com/telefonicaid/fi > Gets a subscription to Orion given its ID and a token for authentication purposes (NGSIv2). -##`OrionBackendImpl` class +## `OrionBackendImpl` class This is a convenience backend class for Orion that implements the `OrionBackend` interface described above. `OrionBackendImpl` really wraps the Orion API ([NGSIv1](http://telefonicaid.github.io/fiware-orion/api/v1/) and [NGSIv2](http://telefonicaid.github.io/fiware-orion/api/v2/latest/)). diff --git a/doc/cygnus-common/backends_catalogue/postgresql_backend.md b/doc/cygnus-common/backends_catalogue/postgresql_backend.md index c7f5ae478..0ef6910d5 100644 --- a/doc/cygnus-common/backends_catalogue/postgresql_backend.md +++ b/doc/cygnus-common/backends_catalogue/postgresql_backend.md @@ -1,5 +1,5 @@ -#PostgreSQL backend -##`PostgreSQLBackend` interface +# PostgreSQL backend +## `PostgreSQLBackend` interface This class enumerates the methods any [PostgreSQL](http://www.postgresql.org/) backend implementation must expose. In this case, the following ones: void createSchema(String schemaName) throws Exception; @@ -14,7 +14,7 @@ This class enumerates the methods any [PostgreSQL](http://www.postgresql.org/) b > Persists the accumulated context data (in the form of the given field values) regarding an entity within the given table. This table belongs to the given database. The field names are given as well to ensure the right insert of the field values. -##`PostgreSQLBackendImpl` class +## `PostgreSQLBackendImpl` class This is a convenience backend class for PostgreSQL that implements the `PostgreSQLBackend` interface described above. `PostgreSQLBackendImpl` really wraps the [PostgreSQL JDBC driver](https://jdbc.postgresql.org/). diff --git a/doc/cygnus-common/installation_and_administration_guide/README.md b/doc/cygnus-common/installation_and_administration_guide/README.md index 27afd9dfb..4115a33f9 100644 --- a/doc/cygnus-common/installation_and_administration_guide/README.md +++ b/doc/cygnus-common/installation_and_administration_guide/README.md @@ -1,4 +1,4 @@ -#Installation and Administration Guide +# Installation and Administration Guide * [Introduction](./introduction.md) * Installation: diff --git a/doc/cygnus-common/installation_and_administration_guide/cygnus_agent_conf.md b/doc/cygnus-common/installation_and_administration_guide/cygnus_agent_conf.md index 476b6ff4b..543a65407 100644 --- a/doc/cygnus-common/installation_and_administration_guide/cygnus_agent_conf.md +++ b/doc/cygnus-common/installation_and_administration_guide/cygnus_agent_conf.md @@ -1,11 +1,11 @@ -#Cygnus agent configuration +# Cygnus agent configuration Content: * [Introduction](#section1) * [`cygnus_instance_.conf`](#section2) * [`agent_.conf`](#section3) -##Introduction +## Introduction Any Cygnus agent is configured through two different files: * A `cygnus_instance_.conf` file addressing all those non Flume parameters, such as the Flume agent name, the specific log file for this instance, the administration port, etc. This configuration file is not necessary if Cygnus is run as a standalone application (see later), but it is mandatory if run as a service (see later). @@ -24,7 +24,7 @@ In addition, (a unique) `log4j.properties` controls how Cygnus logs its traces. [Top](#top) -##`cygnus_instance_.conf` +## `cygnus_instance_.conf` The file `cygnus_instance_.conf` can be instantiated from a template given in the Cygnus repository, `conf/cygnus_instance.conf.template`. ``` @@ -48,7 +48,7 @@ As you can see, this file allows configuring the log file. For a detailed loggin [Top](#top) -##`agent_.conf` +## `agent_.conf` The file `agent_.conf` can be instantiated from a template given in the Cygnus repository, `conf/agent.conf.template`. While no specific Cygnus agent is used, this template is just the Apache Flume template. diff --git a/doc/cygnus-common/installation_and_administration_guide/diagnosis_procedures.md b/doc/cygnus-common/installation_and_administration_guide/diagnosis_procedures.md index e6a77dbc3..7b10a7d48 100644 --- a/doc/cygnus-common/installation_and_administration_guide/diagnosis_procedures.md +++ b/doc/cygnus-common/installation_and_administration_guide/diagnosis_procedures.md @@ -1,4 +1,4 @@ -#Diagnosis procedures +# Diagnosis procedures Content: * [Problem: Logs are not traced](#section1) @@ -10,8 +10,8 @@ Content: * [Problem: The GUI does not work](#section4) * [Other problems](#section5) -##Problem: Logs are not traced -###Reason: There may be a problem with the logging folder +## Problem: Logs are not traced +### Reason: There may be a problem with the logging folder First, check the folder `/var/log/cygnus` has been created: ``` @@ -32,7 +32,7 @@ Third, check the permissions of the log folder. If the permissions does not cont [Top](#top) -###Reason: There may be a problem with the logging configuration of Cygnus +### Reason: There may be a problem with the logging configuration of Cygnus Check the log4j configuration is using a file-related appender. First of all, check you have a valid `lo4j.properties` file (not a template) in `/usr/cygnus/conf/`. @@ -50,13 +50,13 @@ Check the apender value is `LOG_FILE`. [Top](#top) -##Problem: The API does not work -###Reason: There may be a problem with the configured port +## Problem: The API does not work +### Reason: There may be a problem with the configured port Check the port you are using in the request is the one configued in Cygnus. By default, it is `8081`, but can be modified by Cygnus administrator. [Top](#top) -###Reason: The configured port is not open in the firewall +### Reason: The configured port is not open in the firewall The API port may be properly configured but not opened in the firewall (if such a firewall is running) protecting your machine. The specific solution depends on the specific firewall. Here, `iptables`-based firewalling is shown. Please, check the port is open (default `8081` is used in the examples): @@ -80,12 +80,12 @@ If not, open it: [Top](#top) -##Problem: The GUI does not work +## Problem: The GUI does not work Coming soon. [Top](#top) -##Other problems +## Other problems Please look for `fiware-cygnus` tag in [stackoverflow.com](http://stackoverflow.com/search?q=fiware+cygnus). [Top](#top) diff --git a/doc/cygnus-common/installation_and_administration_guide/flume_env_conf.md b/doc/cygnus-common/installation_and_administration_guide/flume_env_conf.md index 2a2113573..331bb7048 100644 --- a/doc/cygnus-common/installation_and_administration_guide/flume_env_conf.md +++ b/doc/cygnus-common/installation_and_administration_guide/flume_env_conf.md @@ -1,4 +1,4 @@ -#Flume environment configuration +# Flume environment configuration The file `flume-env.sh` can be instantiated from a template given in the Cygnus repository, `conf/flume-env.sh.template`. ``` diff --git a/doc/cygnus-common/installation_and_administration_guide/hw_requirements.md b/doc/cygnus-common/installation_and_administration_guide/hw_requirements.md index 37c9956d3..8feae83fe 100644 --- a/doc/cygnus-common/installation_and_administration_guide/hw_requirements.md +++ b/doc/cygnus-common/installation_and_administration_guide/hw_requirements.md @@ -1,5 +1,5 @@ -#Hardware requirements +# Hardware requirements The following guidelines are orientative. Since any Cygnus agent based on cygnus-common may have specific requirements, it is recommended to check such specific requirements. * RAM: 1 GB, specially if abusing of the batching mechanism. -* HDD: A few GB may be enough unless the channel types are configured as `FileChannel` type. \ No newline at end of file +* HDD: A few GB may be enough unless the channel types are configured as `FileChannel` type. diff --git a/doc/cygnus-common/installation_and_administration_guide/install_from_sources.md b/doc/cygnus-common/installation_and_administration_guide/install_from_sources.md index ea8c6ea31..1bc70eece 100644 --- a/doc/cygnus-common/installation_and_administration_guide/install_from_sources.md +++ b/doc/cygnus-common/installation_and_administration_guide/install_from_sources.md @@ -1,4 +1,4 @@ -#Installing cygnus-common from sources +# Installing cygnus-common from sources Content: * [Prerequisites](#section1) @@ -12,7 +12,7 @@ Content: * [Known issues](#section5.4) * [Installing dependencies](#section6) -##Prerequisites +## Prerequisites Maven (and thus Java SDK, since Maven is a Java tool) is needed in order to install cygnus-common. In order to install Java SDK (not JRE), just type (CentOS machines): @@ -33,7 +33,7 @@ Maven is installed by downloading it from [maven.apache.org](http://maven.apache [Top](#top) -##`cygnus` user creation +## `cygnus` user creation It is highly recommended to create a `cygnus` Unix user, under which Cygnus will be installed and run. By the way, this is how the [RPM](./install_with_rpm.md) proceeds. Creating such a user is quite simple. As a sudoer user (root or any other allowed), type the following: @@ -50,7 +50,7 @@ Once created, change to this new fresh user in order to proceed with the rest of [Top](#top) -##`log4j` path +## `log4j` path Once the user is created is necessary to create the path `/var/log/cygnus` for `log4j` purposes. Start by creating the path and then give permissions for `cygnus` user: $ mkdir -p /var/log/cygnus @@ -60,7 +60,7 @@ This step is important because if you don't have the log path created Cygnus wil [Top](#top) -##Installing Apache Flume +## Installing Apache Flume Apache Flume can be easily installed by downloading its latests version from [flume.apache.org](http://flume.apache.org/download.html). Move the untared directory to a folder of your choice (represented by `APACHE_FLUME_HOME`): $ wget http://www.eu.apache.org/dist/flume/1.4.0/apache-flume-1.4.0-bin.tar.gz @@ -80,8 +80,8 @@ Some remarks: [Top](#top) -##Installing cygnus-common -###Cloning `fiware-cygnus` +## Installing cygnus-common +### Cloning `fiware-cygnus` Start by cloning the Github repository containing cygnus-common: $ git clone https://github.com/telefonicaid/fiware-cygnus.git @@ -92,7 +92,7 @@ Start by cloning the Github repository containing cygnus-common: [Top](#top) -###Installing `cygnus-common` +### Installing `cygnus-common` The developed classes must be packaged in a Java jar file. This can be done as a fat Java jar containing all the third-party dependencies (**recommended**). You may need to edit the `pom.xml` (\*): $ cd cygnus-common @@ -117,7 +117,7 @@ Finally, please find a `compile.sh` script containing all the commands shown in [Top](#top) -###Installing `cygnus-flume-ng` script +### Installing `cygnus-flume-ng` script The installation is completed by copying the `cygnus-flume-ng` script into `APACHE_FLUME_HOME/bin`: $ cp target/classes/cygnus-flume-ng APACHE_FLUME_HOME/bin @@ -125,14 +125,14 @@ The installation is completed by copying the `cygnus-flume-ng` script into `APAC [Top](#top) -###Known issues +### Known issues It may happen while compiling `cygnus-common` the Maven JVM has not enough memory. This can be changed as detailed at the [Maven official documentation](https://cwiki.apache.org/confluence/display/MAVEN/OutOfMemoryError): $ export MAVEN_OPTS="-Xmx512m -XX:MaxPermSize=128m" [Top](#top) -##Installing dependencies +## Installing dependencies These are the packages you will need to install under `APACHE_FLUME_HOME/plugins.d/cygnus/libext/` **if you did not included them in the Cygnus jar**: | Cygnus dependencies | Version | Required by / comments | diff --git a/doc/cygnus-common/installation_and_administration_guide/install_with_docker.md b/doc/cygnus-common/installation_and_administration_guide/install_with_docker.md index 05e7c901d..14a81508f 100644 --- a/doc/cygnus-common/installation_and_administration_guide/install_with_docker.md +++ b/doc/cygnus-common/installation_and_administration_guide/install_with_docker.md @@ -1,4 +1,4 @@ -#cygnus-common docker +# cygnus-common docker Content: * [Before starting](#section1) @@ -12,13 +12,13 @@ Content: * [Environment variables](#section3.2.2) * [Using volumes](#section3.2.3) -##Before starting +## Before starting Obviously, you will need docker installed and running in you machine. Please, check [this](https://docs.docker.com/linux/started/) official start guide. [Top](#top) -##Getting an image -###Building from sources +## Getting an image +### Building from sources Start by cloning the `fiware-cygnus` repository: $ git clone https://github.com/telefonicaid/fiware-cygnus.git @@ -42,7 +42,7 @@ centos 6 61bf77ab8841 6 weeks ago [Top](#top) -###Using docker hub image +### Using docker hub image Instead of building an image from the scratch, you may download it from [hub.docker.com](https://hub.docker.com/r/fiware/cygnus-common/): $ docker pull fiware/cygnus-common @@ -58,8 +58,8 @@ centos 6 61bf77ab8841 6 weeks ago [Top](#top) -##Using the image -###As it is +## Using the image +### As it is The cygnus-common image (either built from the scratch, either downloaded from hub.docker.com) allows running a Cygnus agent in charge of logging messages at INFO level. This is because the default agent configuration runs a [logger-sink](https://flume.apache.org/FlumeUserGuide.html#logger-sink). Start a container for this image by typing in a terminal: @@ -128,7 +128,7 @@ CONTAINER ID IMAGE COMMAND CREATED [Top](#top) -###Using a specific configuration +### Using a specific configuration As seen above, the default configuration distributed with the image is tied to certain values that may not be suitable for you tests. Specifically: * The logging level is `INFO`. @@ -138,21 +138,21 @@ You may need to alter the above values with values of your own. [Top](#top) -####Editing the docker files +#### Editing the docker files The easiest way is by editing both the `Dockerfile` and/or `agent.conf` file under `docker/cygnus-common` and building the cygnus-common image from the scratch. This gives you total control on the docker image. [Top](#top) -####Environment variables +#### Environment variables Those parameters associated to an environment variable can be easily overwritten in the command line using the `-e` option. For instance, if you want to change the log4j logging level, simply run: $ docker run -e LOG_LEVEL='DEBUG' cygnus-common [Top](#top) -####Using volumes +#### Using volumes Another possibility is to start a container with a volume (`-v` option) and map the entire configuration file within the container with a local version of the file: $ docker run -v /absolute/path/to/local/agent.conf:/opt/apache-flume/conf/agent.conf cygnus-common-1 diff --git a/doc/cygnus-common/installation_and_administration_guide/install_with_rpm.md b/doc/cygnus-common/installation_and_administration_guide/install_with_rpm.md index bdec1708a..c0878648a 100644 --- a/doc/cygnus-common/installation_and_administration_guide/install_with_rpm.md +++ b/doc/cygnus-common/installation_and_administration_guide/install_with_rpm.md @@ -1,4 +1,4 @@ -#Installing cygnus-common with RPM (CentOS/RedHat) +# Installing cygnus-common with RPM (CentOS/RedHat) Simply configure the FIWARE repository if not yet configured: $ cat > /etc/yum.repos.d/fiware.repo <Introduction +# Introduction This document details how to install and administrate **cygnus-common**. cygnus-common is the base for any Cygnus agent. Cygnus agents are based on [Apache Flume](http://flume.apache.org/) agents, which are basically composed of a source in charge of receiving the data, a channel where the source puts the data once it has been transformed into a Flume event, and a sink, which takes Flume events from the channel in order to persist the data within its body into a third-party storage. @@ -7,14 +7,14 @@ cygnus-common provides a set of extensions for Apache Flume, for instance, defin [Top](#top) -##Intended audience +## Intended audience This document is mainly addressed to those FIWARE users willing to create historical views about any source of data handled by any of the available Cygnus agents. In that case, you will need this document in order to learn how to install and administrate cygnus-common. If your aim is to contribute to cygnus-common, please refer to the [Contribution guidelines](../../contributing/contributing_guidelines.md). [Top](#top) -##Structure of the document +## Structure of the document Apart from this introduction, this Installation and Administration Guide mainly contains sections about installing, configuring, running and testing cygnus-common. The FIWARE user will also find useful information regarding logs and alarms, how to manage a Cygnus agent through the RESTful interface and important performance tips. In addition, sanity check procedures (useful to know wether the installation was successful or not) and diagnosis procedures (a set of tips aiming to help when an issue arises) are provided as well. [Top](#top) diff --git a/doc/cygnus-common/installation_and_administration_guide/issues_and_contact.md b/doc/cygnus-common/installation_and_administration_guide/issues_and_contact.md index a248c84e3..7f28876db 100644 --- a/doc/cygnus-common/installation_and_administration_guide/issues_and_contact.md +++ b/doc/cygnus-common/installation_and_administration_guide/issues_and_contact.md @@ -1,4 +1,4 @@ -#Reporting issues and contact information +# Reporting issues and contact information There are several channels suited for reporting issues and asking for doubts in general. Each one depends on the nature of the question: * Use [stackoverflow.com](http://stackoverflow.com) for specific questions about this software. Typically, these will be related to installation problems, errors and bugs. Development questions when forking the code are welcome as well. Use the `fiware-cygnus` tag. diff --git a/doc/cygnus-common/installation_and_administration_guide/log4j_conf.md b/doc/cygnus-common/installation_and_administration_guide/log4j_conf.md index e14c322d0..858cadc0a 100644 --- a/doc/cygnus-common/installation_and_administration_guide/log4j_conf.md +++ b/doc/cygnus-common/installation_and_administration_guide/log4j_conf.md @@ -1,4 +1,4 @@ -#log4j configuration +# log4j configuration The file `log4j.properties` can be instantiated from a template given in the cygnus-common repository, `conf/log4j.properties.template`. Its content should not be edited unless some of the default values for log path, file name, logging level or appender are wanted to be changed. diff --git a/doc/cygnus-common/installation_and_administration_guide/logs_and_alarms.md b/doc/cygnus-common/installation_and_administration_guide/logs_and_alarms.md index b1df7e0ac..7eb9ba271 100644 --- a/doc/cygnus-common/installation_and_administration_guide/logs_and_alarms.md +++ b/doc/cygnus-common/installation_and_administration_guide/logs_and_alarms.md @@ -1,11 +1,11 @@ -#Logs and alarms +# Logs and alarms Content: * [Introduction](#section1) * [Log message types](#section2) * [Alarm conditions](#seciton3) -##Introduction +## Introduction This document describes the alarms a platform integrating Cygnus should raise when an incident happens. Thus, it is addressed to professional operators and such platform administrators. Cygnus messages are explained before the alarm conditions deriving from those messages are described. @@ -21,7 +21,7 @@ For each alarm, the following information is given: [Top](#top) -##Log message types +## Log message types Cygnus logs are categorized under seven message types, each one identified by a tag in the custom message part of the trace. These are the tags: * Fatal error (`FATAL` level). These kind of errors may cause Cygnus to stop, and thus must be repported to the development team through [stackoverflow.com](http://stackoverflow.com/search?q=fiware) (please, tag it with fiware). @@ -50,7 +50,7 @@ Debug messages are labeled as Debug, with a logging level of `DEBUG`. Inf [Top](#top) -##Alarm conditions +## Alarm conditions Alarm ID | Severity | Detection strategy | Stop condition | Description | Action ---|---|---|---|---|--- 1 | CRITICAL | A `FATAL` trace is found. | For each configured Cygnus component (i.e. `NGSIRestHandler`, `NGSIHDFSSink`, `NGSIMySQLSink` and `NGSICKANSink`), the following trace is found: Startup completed. | A problem has happend at Cygnus startup. The `msg` field details the particular problem. | Fix the issue that is precluding Cygnus startup, e.g. if the problem was due to the listening port of a certain source is already being used, then change such listening port or stop the process using it. diff --git a/doc/cygnus-common/installation_and_administration_guide/management_interface.md b/doc/cygnus-common/installation_and_administration_guide/management_interface.md index 3fa7b6d57..710d8a787 100644 --- a/doc/cygnus-common/installation_and_administration_guide/management_interface.md +++ b/doc/cygnus-common/installation_and_administration_guide/management_interface.md @@ -1,4 +1,4 @@ -#Management interface +# Management interface Content: * [Apiary version of this document](#section1) @@ -17,12 +17,12 @@ Content: * [PUT `/admin/configuration/instance`](#section10) * [DELETE `/admin/configuration/instance`](#section11) -##Apiary version of this document +## Apiary version of this document This API specification can be checked at [Apiary](http://telefonicaid.github.io/fiware-cygnus/api/latest) as well. [Top](#top) -##`GET /admin/log` +## `GET /admin/log` Gets the logging level of Cygnus. ``` @@ -45,7 +45,7 @@ Responses: [Top](#top) -##`PUT /admin/log` +## `PUT /admin/log` Updates the logging level of Cygnus, given the logging level as a query parameter. Valid logging levels are `DEBUG`, `INFO`, `WARNING` (`WARN` also works), `ERROR` and `FATAL`. @@ -72,8 +72,8 @@ Responses: [Top](#top) -##`GET /admin/configuration/agent` -###`GET` all parameters +## `GET /admin/configuration/agent` +### `GET` all parameters Gets all the parameters from an agent given the path to the configuration file as the URI within the URL. The name of the agent must start with `agent_`. @@ -109,7 +109,7 @@ Invalid agent configuration file name: [Top](#top) -###`GET` a single parameter +### `GET` a single parameter Gets a single parameter from an agent given the path to the configuration file as the URI within the URL and the name of the parameter as a query parameter. The name of the agent must start with `agent_`. @@ -151,7 +151,7 @@ Invalid agent configuration file name: [Top](#top) -##`POST /admin/configuration/agent` +## `POST /admin/configuration/agent` Posts a single parameter if it doesn't exist in the agent given the path to the configuration file as the URI within the URL and the name and the value of the parameter as a query parameters. The name of the agent must start with `agent_`. @@ -193,7 +193,7 @@ Invalid agent configuration file name: [Top](#top) -##`PUT /admin/configuration/agent` +## `PUT /admin/configuration/agent` Puts a single parameter if it doesn't exist or update it if already exists in the agent given the path to the configuration file as the URI within the URL and the name and the value of the parameter as a query parameters. The name of the agent must start with `agent_`. @@ -275,7 +275,7 @@ cygnus-common.sinks.mysql-sink.attr_persistence = row [Top](#top) -##`DELETE /admin/configuration/agent` +## `DELETE /admin/configuration/agent` Deletes a single parameter if it exists in the agent given the path to the configuration file as the URI within the URL and the name of the parameter as a query parameter. The name of the agent must start with `agent_`. @@ -317,8 +317,8 @@ Invalid agent configuration file name: [Top](#top) -##`GET /admin/configuration/instance` -###`GET` all parameters +## `GET /admin/configuration/instance` +### `GET` all parameters Gets all the parameters from an instance given the path to the configuration file as the URI within the URL. The path to the instance must be with `/usr/cygnus/conf`. @@ -353,7 +353,7 @@ Instance configuration file not found: [Top](#top) -###`GET` a single parameter +### `GET` a single parameter Gets a single parameter from an instance given the path to the configuration file as the URI within the URL and the name of the parameter as a query parameter. The path to the instance must be with `/usr/cygnus/conf`. @@ -395,7 +395,7 @@ Instance configuration file not found: [Top](#top) -##`POST /admin/configuration/instance` +## `POST /admin/configuration/instance` Posts a single parameter if it doesn't exist in the instance given the path to the configuration file as the URI within the URL and the name and the value of the parameter as a query parameters. The path to the instance must be with `/usr/cygnus/conf`. @@ -437,7 +437,7 @@ Instance configuration file not found: [Top](#top) -##`PUT /admin/configuration/instance` +## `PUT /admin/configuration/instance` Puts a single parameter if it doesn't exist or update it if already exists in the instance given the path to the configuration file as the URI within the URL and the name and the value of the parameter as a query parameters. The path to the instance must be with `/usr/cygnus/conf`. @@ -525,7 +525,7 @@ POLLING_INTERVAL=30 [Top](#top) -##`DELETE /admin/configuration/instance` +## `DELETE /admin/configuration/instance` Deletes a single parameter in the instance given the path to the configuration file as the URI within the URL and the name of the parameter as a query parameters. The path to the instance must be with `/usr/cygnus/conf`. diff --git a/doc/cygnus-common/installation_and_administration_guide/management_interface_v1.md b/doc/cygnus-common/installation_and_administration_guide/management_interface_v1.md index 67d436099..4b20680a4 100644 --- a/doc/cygnus-common/installation_and_administration_guide/management_interface_v1.md +++ b/doc/cygnus-common/installation_and_administration_guide/management_interface_v1.md @@ -1,4 +1,4 @@ -#Management interface +# Management interface Content: * [Apiary version of this document](#section1) @@ -42,12 +42,12 @@ Content: * [DELETE `/v1/admin/metrics`](#section7.2) * [Available aliases](#section8) -##Apiary version of this document +## Apiary version of this document This API specification can be checked at [Apiary](http://telefonicaid.github.io/fiware-cygnus/api/latests) as well. [Top](#top) -##`GET /v1/version` +## `GET /v1/version` Gets the version of the running software, including the last Git commit: ``` @@ -65,8 +65,8 @@ Response: [Top](#top) -##Stats -###`GET /v1/stats` +## Stats +### `GET /v1/stats` Gets statistics about the configured Flume components. It is important to note in order to gathering statistics from the channels, these must be of type `com.telefonica.iot.cygnus.channels.CygnusMemoryChannel` or `com.telefonica.iot.cygnus.channels.CygnusFileChannel`. Regarding the sources, it returns: @@ -142,7 +142,7 @@ Response: [Top](#top) -###`PUT /v1/stats` +### `PUT /v1/stats` Resets the statistics about the configured Flume components. It is important to note in order to reset statistics from the channels, these must be of type `com.telefonica.iot.cygnus.channels.CygnusMemoryChannel` or `com.telefonica.iot.cygnus.channels.CygnusFileChannel`. ``` @@ -157,8 +157,8 @@ Response: [Top](#top) -##Grouping Rules -###`GET /v1/groupingrules` +## Grouping Rules +### `GET /v1/groupingrules` Gets the configured [grouping rules](../../cygnus-ngsi/flume_extensions_catalogue/ngsi_grouping_interceptor.md). ``` @@ -195,7 +195,7 @@ Response: [Top](#top) -###`POST /v1/groupingrules` +### `POST /v1/groupingrules` Adds a new rule, passed as a Json in the payload, to the [grouping rules](../../cygnus-ngsi/flume_extensions_catalogue/ngsi_grouping_interceptor.md). ``` @@ -218,7 +218,7 @@ Please observe the `id` field is not passed as part of the posted Json. This is [Top](#top) -###`PUT /v1/groupingrules` +### `PUT /v1/groupingrules` Updates an already existent [grouping rules](../../cygnus-ngsi/flume_extensions_catalogue/ngsi_grouping_interceptor.md), given its ID as a query parameter and passed the rule as a Json in the payload. ``` @@ -239,7 +239,7 @@ Response: [Top](#top) -###`DELETE /v1/groupingrules` +### `DELETE /v1/groupingrules` Deletes a [grouping rules](../../cygnus-ngsi/flume_extensions_catalogue/ngsi_grouping_interceptor.md), given its ID as a query parameter. ``` @@ -254,9 +254,9 @@ Response: [Top](#top) -##Subscriptions -###`POST /v1/subscriptions` -####`NGSI Version 1` +## Subscriptions +### `POST /v1/subscriptions` +#### `NGSI Version 1` Creates a new subscription to Orion given the version of NGSI (`ngsi_version=1` in this case). The Json passed in the payload contains the Json subscription itself and Orion's endpoint details. ``` @@ -316,7 +316,7 @@ Please observe Cygnus checks if the Json passed in the payload is valid (syntact [Top](#top) -####`NGSI Version 2` +#### `NGSI Version 2` Creates a new subscription to Orion given the version of NGSI (`ngsi_version=2` in this case). The Json passed in the payload contains the Json subscription itself and Orion's endpoint details. ``` @@ -387,7 +387,7 @@ Please observe Cygnus checks if the Json passed in the payload is valid (syntact [Top](#top) -###`DELETE /v1/subscriptions` +### `DELETE /v1/subscriptions` Deletes a subscription made to Orion given its ID and the NGSI version. The Json passed in the payload contains the Orion's endpoint details. ``` @@ -441,8 +441,8 @@ Missing fields (empty or not given): [Top](#top) -###`GET /v1/subscriptions` -#### GET subscription by ID +### `GET /v1/subscriptions` +#### GET subscription by ID Gets an existent subscription from Orion, given the NGSI version and the subscription id as a query parameter. Valid NGSI versions are `1` and `2` (this method only works with `ngsi_version=2` due to this method is not implemented in version `1`). @@ -486,7 +486,7 @@ Missing or empty parameters: [Top](#top) -#### GET all subscriptions +#### GET all subscriptions Gets all existent subscriptions from Orion, given the NGSI version as a query parameter. Valid NGSI versions are `1` and `2` (this method only works with `ngsi_version=2` due to this method is not implemented in version `1`). @@ -524,9 +524,9 @@ Missing or empty parameters: [Top](#top) -##Logs -### GET `/v1/admin/log/appenders` -#### GET appender by name +## Logs +### GET `/v1/admin/log/appenders` +#### GET appender by name Gets an existent appender from a running logger given its name. It can be retrieved from the running Cygnus or from the `log4j.properties` file. If parameterised with `transient=true` (or omitting this parameter) the appenders are retrieved from Cygnus, if `transient=false` are retrieved from file. @@ -553,7 +553,7 @@ Invalid `transient` parameter is given: [Top](#top) -#### GET all appenders +#### GET all appenders Gets all existent appenders from a running logger. They can be retrieved from the running Cygnus or from the `log4j.properties` file. If parameterised with `transient=true` (or omitting this parameter) the appenders are retrieved from Cygnus, if `transient=false` are retrieved from file. @@ -580,8 +580,8 @@ Invalid `transient` parameter is given: [Top](#top) -### GET `/v1/admin/log/loggers` -#### GET logger by name +### GET `/v1/admin/log/loggers` +#### GET logger by name Gets an existent logger from a running Cygnus given its name. It can be retrieved from the running Cygnus or from the `log4j.properties` file. If parameterised with `transient=true` (or omitting this parameter) the logger is retrieved from Cygnus, if `transient=false` is retrieved from file. @@ -608,7 +608,7 @@ Invalid `transient` parameter is given: [Top](#top) -#### GET all loggers +#### GET all loggers Gets all existent loggers from a running Cygnus. They can be retrieved from the running Cygnus or from the `log4j.properties` file. If parameterised with `transient=true` (or omitting this parameter) the loggers are retrieved from Cygnus, if `transient=false` are retrieved from file. @@ -635,7 +635,7 @@ When an invalid `transient` parameter is given: [Top](#top) -### PUT and POST methods for loggers and appenders +### PUT and POST methods for loggers and appenders Following table resume the behaviour of PUT and POST method for every mode (`transient=true` or `transient=false`) and every method: | | APPENDER | | LOGGER | | @@ -649,7 +649,7 @@ Following table resume the behaviour of PUT and POST method for every mode (`tra [Top](#top) -#### PUT `/v1/admin/log/appenders` +#### PUT `/v1/admin/log/appenders` Puts an appender in a running Cygnus given a JSON with the information about the name and class of the appender and its layout and ConversionPattern of its pattern. If parameterised with `transient=true` (or omitting this parameter) the appender is updated if the name is equals with the current active appender; if `transient=false` the appender is added or updated in the file. ``` @@ -697,7 +697,7 @@ Sending only a request without JSON or sending a invalid one: [Top](#top) -#### POST `/v1/admin/log/appenders` +#### POST `/v1/admin/log/appenders` Posts an appender in a running Cygnus given a JSON with the information about the name and class of the appender and its layout and ConversionPattern of its pattern. If parameterised with `transient=false` is posted on the file. POST method is not implemented with `transient=true`. ``` @@ -745,7 +745,7 @@ Sending a request without JSON or an invalid one: [Top](#top) -#### PUT `/v1/admin/log/loggers` +#### PUT `/v1/admin/log/loggers` Puts an logger in a running Cygnus given a JSON with the information about the name and level of the logger. If parameterised with `transient=true` (or omitting this parameter) the logger is updated if the name is equals with a current logger. PUT method only update in transient mode due to logger creation limitations in the code. If `transient=false` the appender is added or updated in the file. ``` @@ -789,7 +789,7 @@ Sending a request without JSON or an invalid one: [Top](#top) -#### POST `/v1/admin/log/loggers` +#### POST `/v1/admin/log/loggers` Posts an logger on a running Cygnus. This method only accepts the parameter `transient=false` due to logger creation limitations in the code. Therefore, the loggers are posted on the `log4j.properties` file. Posts an logger in a running Cygnus given a JSON with the information about the name and level of the logger. If parameterised with `transient=false` is posted on the file. POST method is not implemented with `transient=true`. @@ -834,8 +834,8 @@ Sending a request without JSON or an invalid one: [Top](#top) -### DELETE `/v1/admin/log/appenders` -#### DELETE appender by name +### DELETE `/v1/admin/log/appenders` +#### DELETE appender by name Deletes an existent appender from a running logger given its name. It can be deleted on the running Cygnus or in the `log4j.properties` file. If parameterised with `transient=true` (or omitting this parameter) the appender is deleted on Cygnus, if `transient=false` is deleted in the file. @@ -862,7 +862,7 @@ When an invalid `transient` parameter is given: [Top](#top) -#### DELETE all appenders +#### DELETE all appenders Deletes all existent appenders from a running logger. They can be deleted on the running Cygnus or in the `log4j.properties` file. If parameterised with `transient=true` (or omitting this parameter) the appenders are deleted on Cygnus, if `transient=false` are deleted in the file. @@ -884,8 +884,8 @@ When an invalid `transient` parameter is given: [Top](#top) -### DELETE `/v1/admin/log/loggers` -#### DELETE logger by name +### DELETE `/v1/admin/log/loggers` +#### DELETE logger by name Deletes an existent logger from a running Cygnus given its name. It can be deleted on a running Cygnus or in the `log4j.properties` file. If parameterised with `transient=true` (or omitting this parameter) the logger is deleted on Cygnus, if `transient=false` is deleted in the file. @@ -912,7 +912,7 @@ When an invalid `transient` parameter is given: [Top](#top) -#### DELETE all loggers +#### DELETE all loggers Deletes all existent loggers from a running Cygnus. They can be deleted on a running Cygnus or in the `log4j.properties` file. If parameterised with `transient=true` (or omitting this parameter) the loggers are deleted on Cygnus, if `transient=false` are deleted in the file. @@ -934,8 +934,8 @@ When an invalid `transient` parameter is given: [Top](#top) -##Metrics -###`GET /v1/admin/metrics` +## Metrics +### `GET /v1/admin/metrics` Gets metrics for a whole Cygnus agent. Specifically: * `incomingTransactions`. Number of incoming transactions (a transaction involves a request and a response). In other words, number of NGSI notifications received. @@ -1019,7 +1019,7 @@ Finally, because Cygnus implements a retry mechanism for those persistence opera [Top](#top) -###`DELETE /v1/admin/metrics` +### `DELETE /v1/admin/metrics` Deletes metrics, putting counters to zero. ``` @@ -1034,10 +1034,10 @@ Response: [Top](#top) -##Available aliases +## Available aliases |Alias|Operation| |---|---| |GET /admin/metrics|GET /v1/admin/metrics| |DELETE /admin/metrics|DELETE /v1/admin/metrics| -[Top](#top) \ No newline at end of file +[Top](#top) diff --git a/doc/cygnus-common/installation_and_administration_guide/running_as_process.md b/doc/cygnus-common/installation_and_administration_guide/running_as_process.md index d431cb246..4e26a1351 100644 --- a/doc/cygnus-common/installation_and_administration_guide/running_as_process.md +++ b/doc/cygnus-common/installation_and_administration_guide/running_as_process.md @@ -1,4 +1,4 @@ -#Running cygnus-common as a process +# Running cygnus-common as a process Cygnus implements its own startup script, `cygnus-flume-ng` which replaces the standard `flume-ng` one, which in the end runs a custom `com.telefonica.iot.cygnus.nodes.CygnusApplication` instead of a standard `org.apache.flume.node.Application`. In foreground (with logging): diff --git a/doc/cygnus-common/installation_and_administration_guide/running_as_service.md b/doc/cygnus-common/installation_and_administration_guide/running_as_service.md index df879d5cd..061f866a5 100644 --- a/doc/cygnus-common/installation_and_administration_guide/running_as_service.md +++ b/doc/cygnus-common/installation_and_administration_guide/running_as_service.md @@ -1,4 +1,4 @@ -#Running cygnus-common as a service +# Running cygnus-common as a service **Note**: Cygnus can only be run as a service if you installed it through the RPM. Running cygnus-common is the same than running a plain Flume. Once the `cygnus_instance_.conf` and `agent_.conf` files are properly configured, just use the `service` command to start, restart, stop or get the status (as a sudoer): diff --git a/doc/cygnus-common/installation_and_administration_guide/sanity_checks.md b/doc/cygnus-common/installation_and_administration_guide/sanity_checks.md index 6d77710f5..45d6593e2 100644 --- a/doc/cygnus-common/installation_and_administration_guide/sanity_checks.md +++ b/doc/cygnus-common/installation_and_administration_guide/sanity_checks.md @@ -1,4 +1,4 @@ -#Sanity checks +# Sanity checks Content: * [How to proceed](#section1) @@ -6,14 +6,14 @@ Content: * [Check: API port](#section3) * [Check: GUI port](#section4) -##How to proceed +## How to proceed Verify all the sanity checks included in this document, one by one. If you have any problem with one specific check, please go to the proper section of the [diagnosis procedures](./diagnosis_procedures.md) document. [Top](#top) -##Check: Logs +## Check: Logs Any Cygnus agent logs in `/var/log/cygnus/cygnus.log`, unless the `console` appender is used. In any case, traced logs must look like the following ones: @@ -93,7 +93,7 @@ And the Management Interface is setup: [Top](#top) -##Check: API port +## Check: API port The API must be up and running in the port you configured (either using the `-p` option in the command line, either using the `ADMIN_PORT` parameter in the `cygnus_instance_.conf` file). `8081` is the default. You can check it by asking for the Cygnus version: @@ -105,7 +105,7 @@ $ curl "http://localhost:8081/v1/version" [Top](#top) -##Check: GUI port +## Check: GUI port Coming soon. [Top](#top) diff --git a/doc/cygnus-common/installation_and_administration_guide/testing.md b/doc/cygnus-common/installation_and_administration_guide/testing.md index 415d75829..85ee9e1c3 100644 --- a/doc/cygnus-common/installation_and_administration_guide/testing.md +++ b/doc/cygnus-common/installation_and_administration_guide/testing.md @@ -1,4 +1,4 @@ -#Testing +# Testing Running the tests require [Apache Maven](https://maven.apache.org/) installed and Cygnus sources downloaded. $ git clone https://github.com/telefonicaid/fiware-cygnus.git @@ -167,4 +167,4 @@ Tests run: 43, Failures: 0, Errors: 0, Skipped: 0 [INFO] Finished at: Tue May 03 17:44:40 CEST 2016 [INFO] Final Memory: 24M/81M [INFO] ------------------------------------------------------------------------ -``` \ No newline at end of file +``` diff --git a/doc/cygnus-ngsi/flume_extensions_catalogue/README.md b/doc/cygnus-ngsi/flume_extensions_catalogue/README.md index cc710fd61..2a567a89b 100644 --- a/doc/cygnus-ngsi/flume_extensions_catalogue/README.md +++ b/doc/cygnus-ngsi/flume_extensions_catalogue/README.md @@ -1,4 +1,4 @@ -#Flume extensions catalogue +# Flume extensions catalogue * [Introduction](./introduction.md) * Http source handlers diff --git a/doc/cygnus-ngsi/flume_extensions_catalogue/introduction.md b/doc/cygnus-ngsi/flume_extensions_catalogue/introduction.md index 6bbed7423..6cf659116 100644 --- a/doc/cygnus-ngsi/flume_extensions_catalogue/introduction.md +++ b/doc/cygnus-ngsi/flume_extensions_catalogue/introduction.md @@ -1,14 +1,14 @@ -#Flume extensions catalogue +# Flume extensions catalogue This document details the catalogue of extensions developed for Cygnus on top of [Apache Flume](https://flume.apache.org/). -#Intended audience +# Intended audience The Flume extensions catalogue is a basic piece of documentation for all those FIWARE users using Cygnus. It describes the available extra components added to the Flume technology in order to deal with NGSI-like context data in terms of historic building. Software developers may also be interested in this catalogue since it may guide the creation of new components (specially, sinks) for Cygnus/Flume. [Top](#top) -#Structure of the document +# Structure of the document The document starts detailing the naming conventions adopted in Cygnus when creating data structures in the different storages. This means those data structure (databases, files, tables, collections, etc) names will derive from a subset of the NGSI-like notified information (mainly fiware-service and fiware-servicePath headers, entityId and entityType). Then, it is time to explain [`NGSIRestHandler`](./ngsi_rest_handler.md), the NGSI oriented handler for the http Fume source in charge of translating a NGSI-like notification into a Flume event. diff --git a/doc/cygnus-ngsi/flume_extensions_catalogue/issues_and_contact.md b/doc/cygnus-ngsi/flume_extensions_catalogue/issues_and_contact.md index ef5d63c9a..8a22f55ac 100644 --- a/doc/cygnus-ngsi/flume_extensions_catalogue/issues_and_contact.md +++ b/doc/cygnus-ngsi/flume_extensions_catalogue/issues_and_contact.md @@ -1,4 +1,4 @@ -#Reporting issues and contact information +# Reporting issues and contact information There are several channels suited for reporting issues and asking for doubts in general. Each one depends on the nature of the question: * Use [stackoverflow.com](http://stackoverflow.com) for specific questions about this software. Typically, these will be related to installation problems, errors and bugs. Development questions when forking the code are welcome as well. Use the `fiware-cygnus` tag. diff --git a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_cartodb_sink.md b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_cartodb_sink.md index c3b26d57e..44e5afa42 100644 --- a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_cartodb_sink.md +++ b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_cartodb_sink.md @@ -1,4 +1,4 @@ -#NGSICartoDBSink +# NGSICartoDBSink Content: * [Functionality](#section1) @@ -31,7 +31,7 @@ Content: * [Annexes](#section4) * [Annex 1: provisioning a table](#section4.1) -##Functionality +## Functionality `com.iot.telefonica.cygnus.sinks.NGSICartoDBSink`, or simply `NGSICartoDBSSink` is a cygnus-ngsi sink designed to persist NGSI-like context data events within [Carto](https://carto.com/). Usually, such a context data is notified by a [Orion Context Broker](https://github.com/telefonicaid/fiware-orion) instance, but could be any other system speaking the NGSI language. Independently of the data generator, NGSI context data is always transformed into internal `NGSIEvent` objects at cygnus-ngsi sources. In the end, the information within these events must be mapped into specific Carto data structures at the Cygnus sinks. @@ -40,19 +40,19 @@ Next sections will explain this in detail. [Top](#top) -###Mapping NGSI events to `NGSIEvent` objects +### Mapping NGSI events to `NGSIEvent` objects Notified NGSI events (containing context data) are transformed into `NGSIEvent` objects (for each context element a `NGSIEvent` is created; such an event is a mix of certain headers and a `ContextElement` object), independently of the NGSI data generator or the final backend where it is persisted. This is done at the cygnus-ngsi Http listeners (in Flume jergon, sources) thanks to [`NGSIRestHandler`](/ngsi_rest_handler.md). Once translated, the data (now, as `NGSIEvent` objects) is put into the internal channels for future consumption (see next section). [Top](#top) -###Mapping `NGSIEvent`s to Carto data structures +### Mapping `NGSIEvent`s to Carto data structures Carto is based on [PostgreSQL](http://www.postgresql.org/) and [PostGIS](http://postgis.net/) extensions. It organizes the data in databases (one per organization), schemas (one per user within an organization) and tables (a schema may have one or more tables). Such organization is exploited by `NGSICartoDBSink` each time a `NGSIEvent` is going to be persisted. [Top](#top) -####PostgreSQL databases and schemas naming conventions +#### PostgreSQL databases and schemas naming conventions PostgreSQL databases and schemas are already created by Carto upon organization and username request, respectively. Thus, it is up to Carto to define the naming conventions for these elements; specifically: * Organization must only contain lowercase letters. @@ -62,7 +62,7 @@ Here it is assumed the notified/default FIWARE service maps the PostgreSQL schem [Top](#top) -####PostgreSQL tables naming conventions +#### PostgreSQL tables naming conventions The name of these tables depends on the configured data model and analysis mode (see the [Configuration](#section2.1) section for more details): * Data model by service path (`data_model=dm-by-service-path`). As the data model name denotes, the notified FIWARE service path (or the configured one as default in [`NGSIRestHandler`](./ngsi_rest_handler.md)) is used as the name of the table. This allows the data about all the NGSI entities belonging to the same service path is stored in this unique table. @@ -88,7 +88,7 @@ Please observe the concatenation of entity ID and type is already given in the ` [Top](#top) -####Raw-based storing +#### Raw-based storing Regarding the specific data stored within the tables, if `enable_raw` parameter is set to `true` (default storing mode) then the notified data is **stored as it is, without any processing or modification**. This is the simplest way of storing geolocation data. A single insert is composed for each notified entity, containing such insert the following fields: @@ -107,7 +107,7 @@ It must be said Cygnus does not create Carto tables in the raw-based storing. Th [Top](#top) -####Distance-based storing +#### Distance-based storing If `enable_distance` parameter is set to `true` (by default, this kind of storing is not run) then the notified data is processed based on a distance analysis. As said, the linear distance and elapsed time with regards to the previous geolocation of the entity is obtained, and this information is used to update certain aggregations: total amount of distance, total amount of time and many others. The speed is obtained as well as the result of dividing the distance by the time, and such speed calculation is used as well for updating certain aggregations. The final goal is to **pre-compute a set of distance-based measures** as a function of the geolocation of an entity, allowing for querying about "the total amount of time this entity took to arrive to this point", or "which was the average speed of this entity when passing through this point", etc. **without performing any computation at querying time**. @@ -143,7 +143,7 @@ Different than the raw-based storing, Cygnus is able to create by itself the tab [Top](#top) -####Raw snapshot-based storing +#### Raw snapshot-based storing This analysis mode works the same than the raw-based storing one, except for: * There is not a table per entity, but a table per FIWARE service path. In these sense, this analysis mode can be seen as always working with the `data_model` parameter set to `dm-by-service-path`. @@ -151,8 +151,8 @@ This analysis mode works the same than the raw-based storing one, except for: [Top](#top) -###Example -####`NGSIEvent` +### Example +#### `NGSIEvent` Assuming the following `NGSIEvent` is created from a notified NGSI context data (the code below is an object representation, not any real data format): ngsi-event={ @@ -191,7 +191,7 @@ Assuming the following `NGSIEvent` is created from a notified NGSI context data [Top](#top) -####Table names +#### Table names The PostgreSQL table names will be, depending on the configured data model and analysis mode, the following ones: | FIWARE service path | `dm-by-service-path` | `dm-by-entity` | @@ -201,7 +201,7 @@ The PostgreSQL table names will be, depending on the configured data model and a [Top](#top) -####Raw-based storing +#### Raw-based storing Let's assume a table name `x002f4wheelsxffffcar1xffffcar` (data model by entity, non-root service path, only raw analysis mode). The data stored within this table would be: ``` @@ -264,7 +264,7 @@ curl "https://myusername.cartodb.com/api/v2/sql?q=select * from x002f4wheelsxfff [Top](#top) -####Distance-based storing +#### Distance-based storing Let's assume a table name `x002f4wheelsxffffcar1xffffcarxffffdistance` (data model by entity, non-root service path, only distance analysis mode) with a previous insertion (on the contrary, this would be the first insertion and almost all the aggregated values will be set to 0). The data stored within this table would be: ``` @@ -400,7 +400,7 @@ curl "https://myusername.cartodb.com/api/v2/sql?q=select * from x002f4wheelsxfff [Top](#top) -####Raw snapshot-based storing +#### Raw snapshot-based storing Everything equals to the raw-based storing, but: * The table name is `x002f4wheelsxffffcar1xffffcarxffffrawsnapshot`. @@ -408,8 +408,8 @@ Everything equals to the raw-based storing, but: [Top](#top) -##Administration guide -###Configuration +## Administration guide +### Configuration `NGSICartoDBSink` is configured through the following parameters: | Parameter | Mandatory | Default value | Comments | @@ -482,7 +482,7 @@ $ cat /usr/cygnus/conf/cartodb_keys.conf [Top](#top) -###Use cases +### Use cases The raw-based storing is addressed for those use cases simply wanting to save which were an entity's attribute values at certain time instant or geolocation. Of course, it allows for more complex analysis if experiencing computation time delays is not a problem: the data must be processed at querying time. The above is avoided by the distance-based storing, which provides pre-computed aggregations regarding certain time instant or geolocation. Having pre-computed those aggregations highly improves the response time of the queries. This is suitable for queries such as: @@ -497,20 +497,20 @@ Finally, the raw snapshot storing simply geolocates an entity over time, without [Top](#top) -###Important notes -####`NGSICartoDBSink` and non-geolocated entities +### Important notes +#### `NGSICartoDBSink` and non-geolocated entities It is mandatory the entities aimed to be handled by this sink have a geolocated attribute, either as a `geo:point`-typed attribute, either as an attribute holding a `location` metadata of type `string` and `WGS84` value. [Top](#top) -####Multitenancy support +#### Multitenancy support Different than other NGSI sinks, where a single authorized user is able to create user spaces and write data on behalf of all the other users (who can only read the data), this sink requires the writing credentials of each user and such user spaces created in advance. The reason is Carto imposes the database and schema upon user account creation, which typically are related to the FIWARE service (or FIWARE tenant ID), and the only persistence element Cygnus must create are the tables within the already provisiones databases and schemas. As can be inferred, accessing these databases and schemas require specific user credentials. User credentials must be added to a special file that will be pointed by the Carto sink through the `keys_conf_file` configuration parameter. Of special interest is the account type, which can be `personal` or `enterprise`; such a distinction is important since the queries to the API differ from one to the other. [Top](#top) -####Batching +#### Batching As explained in the [programmers guide](#section3), `NGSICartoDBSink` extends `NGSISink`, which provides a built-in mechanism for collecting events from the internal Flume channel. This mechanism allows extending classes have only to deal with the persistence details of such a batch of events in the final backend. What is important regarding the batch mechanism is it largely increases the performance of the sink, because the number of inserts is dramatically reduced. Let's see an example, let's assume a batch of 100 `NGSIEvent`s. In the best case, all these events regard to the same entity, which means all the data within them will be persisted in the same Carto table. If processing the events one by one, we would need 100 inserts in Carto; nevertheless, in this example only one insert is required. Obviously, not all the events will always regard to the same unique entity, and many entities may be involved within a batch. But that's not a problem, since several sub-batches of events are created within a batch, one sub-batch per final destination Carto table. In the worst case, the whole 100 entities will be about 100 different entities (100 different Carto destinations), but that will not be the usual scenario. Thus, assuming a realistic number of 10-15 sub-batches per batch, we are replacing the 100 inserts of the event by event approach with only 10-15 inserts. @@ -525,7 +525,7 @@ Finally, it must be said currently batching only works with the raw-like storing [Top](#top) -####About the encoding +#### About the encoding Cygnus applies this specific encoding tailored to Carto data structures: * Lowercase alphanumeric characters are not encoded. @@ -539,7 +539,7 @@ Cygnus applies this specific encoding tailored to Carto data structures: [Top](#top) -####About automatically creating the tables +#### About automatically creating the tables It has already been commented, but just a reminder: Cygnus does not automatically create the required tables for the raw-based nor the raw snapshot-based mode. This is because the first notification regarding an entity could not contain the full list of such an entity's attributes, i.e. only the updated attributes could be being notified. On the contrary, the distance-based mode automatically creates the tables since the number and semantic of the table columns is always the same, and it is independent of the entity's attributes. @@ -548,7 +548,7 @@ When required, the Annex 1 shows how to provision a table for Carto, among other [Top](#top) -####Supported Orion's geometries +#### Supported Orion's geometries Current version of `NGSICartoDBSink` supports the following NGSIv2 geometries: * `geo:point`, in this case the geolocated attribute is about a single point. @@ -559,8 +559,8 @@ You can get more information at [NGSIv2](http://telefonicaid.github.io/fiware-or [Top](#top) -##Programmers guide -###`NGSICartoDBSink` class +## Programmers guide +### `NGSICartoDBSink` class As any other NGSI-like sink, `NGSICartoDBSink ` extends the base `NGSISink`. The methods that are extended are: void persistBatch(Batch batch) throws Exception; @@ -577,13 +577,13 @@ A complete configuration as the described above is read from the given `Context` [Top](#top) -###Authentication and authorization +### Authentication and authorization Authentication is done by means of an API key related to the username. Once authenticated, the client is only allowed to create, read, update and delete PostgreSQL tables in the user space (PostgreSQL schema) within the organization (PostgreSQL database). [Top](#top) -##Annexes -###Annex 1: provisioning a table in Carto +## Annexes +### Annex 1: provisioning a table in Carto Following you may find the queries required to provision a table in Carto. Start by creating the table: $ curl -G "https://.cartodb.com/api/v2/sql?api_key=" --data-urlencode "q=CREATE TABLE (recvTime text, fiwareServicePath text, entityId text, entityType text, , _md text, ..., , _md text, the_geom geometry(POINT,4326))" diff --git a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_ckan_sink.md b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_ckan_sink.md index ba5e5c4ef..0dcde8d31 100644 --- a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_ckan_sink.md +++ b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_ckan_sink.md @@ -1,4 +1,4 @@ -#NGSICKANSink +# NGSICKANSink Content: * [Functionality](#section1) @@ -28,7 +28,7 @@ Content: * [Annexes](#section4) * [Provisioning a CKAN resource for the column mode](#section4.1) -##Functionality +## Functionality `com.iot.telefonica.cygnus.sinks.NGSICKANSink`, or simply `NGSICKANSink` is a sink designed to persist NGSI-like context data events within a [CKAN](http://ckan.org/) server. Usually, such a context data is notified by a [Orion Context Broker](https://github.com/telefonicaid/fiware-orion) instance, but could be any other system speaking the NGSI language. Independently of the data generator, NGSI context data is always transformed into internal `NGSIEvent` objects at Cygnus sources. In the end, the information within these events must be mapped into specific CKAN data structures. @@ -37,19 +37,19 @@ Next sections will explain this in detail. [Top](#top) -###Mapping NGSI events to `NGSIEvent` objects +### Mapping NGSI events to `NGSIEvent` objects Notified NGSI events (containing context data) are transformed into `NGSIEvent` objects (for each context element a `NGSIEvent` is created; such an event is a mix of certain headers and a `ContextElement` object), independently of the NGSI data generator or the final backend where it is persisted. This is done at the cygnus-ngsi Http listeners (in Flume jergon, sources) thanks to [`NGSIRestHandler`](/ngsi_rest_handler.md). Once translated, the data (now, as `NGSIEvent` objects) is put into the internal channels for future consumption (see next section). [Top](#top) -###Mapping `NGSIEvent`s to CKAN data structures +### Mapping `NGSIEvent`s to CKAN data structures [CKAN organizes](http://docs.ckan.org/en/latest/user-guide.html) the data in organizations containing packages or datasets; each one of these packages/datasets contains several resources whose data is finally stored in a PostgreSQL database (CKAN Datastore) or plain files (CKAN Filestore). Such organization is exploited by `NGSICKANSink` each time a `NGSIEvent` is going to be persisted. [Top](#top) -####Organizations naming conventions +#### Organizations naming conventions An organization named as the notified `fiware-service` header value (or, in absence of such a header, the defaulted value for the FIWARE service) is created (if not existing yet). Since based in [PostgreSQL only accepts](https://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS), it must be said only alphanumeric characters and the underscore (`_`) are accepted. The hyphen ('-') is also accepted. This leads to certain [encoding](#section2.3.3) is applied depending on the `enable_encoding` configuration parameter. @@ -58,7 +58,7 @@ Nevertheless, different than PostgreSQL, [organization lengths](http://docs.ckan [Top](#top) -####Packages/datasets naming conventions +#### Packages/datasets naming conventions A package/dataset named as the concatenation of the notified `fiware-service` and `fiware-servicePath` header values (or, in absence of such headers, the defaulted value for the FIWARE service and service path) is created (if not existing yet) in the above organization. Since based in [PostgreSQL only accepts](https://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS), it must be said only alphanumeric characters and the underscore (`_`) are accepted. The hyphen ('-') is also accepted. This leads to certain [encoding](#section2.3.3) is applied depending on the `enable_encoding` configuration parameter. @@ -67,7 +67,7 @@ Nevertheless, different than PostgreSQL, [dataset lengths](http://docs.ckan.org/ [Top](#top) -####Resources naming conventions +#### Resources naming conventions CKAN resources follow a single data model (see the [Configuration](#section2.1) section for more details), i.e. per entity. Thus, a resource name always take the concatenation of the entity ID and type. Such a name is already given in the `notified_entities`/`grouped_entities` header values (depending on using or not the grouping rules, see the [Configuration](#section2.1) section for more details) within the `NGSIEvent`. It must be noticed a CKAN Datastore (and a viewer) is also created and associated to the resource above. This datastore, which in the end is a PostgreSQL table, will hold the persisted data. @@ -78,7 +78,7 @@ Despite there is no real limit on the resource names, Cygnus will keep limiting [Top](#top) -####Row-like storing +#### Row-like storing Regarding the specific data stored within the datastore associated to the resource, if `attr_persistence` parameter is set to `row` (default storing mode) then the notified data is stored attribute by attribute, composing an insert for each one of them. Each insert contains the following fields: * `recvTimeTs`: UTC timestamp expressed in miliseconds. @@ -93,7 +93,7 @@ Regarding the specific data stored within the datastore associated to the resour [Top](#top) -####Column-like storing +#### Column-like storing Regarding the specific data stored within the datastore associated to the resource, if `attr_persistence` parameter is set to `column` then a single line is composed for the whole notified entity, containing the following fields: * `recvTime`: UTC timestamp in human-redable format ([ISO 8601](http://en.wikipedia.org/wiki/ISO_8601)). @@ -105,8 +105,8 @@ Regarding the specific data stored within the datastore associated to the resour [Top](#top) -###Example -####`NGSIEvent` +### Example +#### `NGSIEvent` Assuming the following `NGSIEvent` is created from a notified NGSI context data (the code below is an object representation, not any real data format): ngsi-event={ @@ -140,7 +140,7 @@ Assuming the following `NGSIEvent` is created from a notified NGSI context data [Top](#top) -####Organization, dataset and resource names +#### Organization, dataset and resource names Given the above example and using the old encoding, these are the CKAN elements created * Orgnaization: `vehicles`. @@ -155,7 +155,7 @@ Using the new encdoing: [Top](#top) -####Row-like storing +#### Row-like storing Assuming `attr_persistence=row` as configuration parameter, then `NGSICKANSink` will persist the data within the body as: $ curl -s -S -H "Authorization: myapikey" "http://192.168.80.34:80/api/3/action/datastore_search?resource_id=3254b3b4-6ffe-4f3f-8eef-c5c98bfff7a7" @@ -242,7 +242,7 @@ Assuming `attr_persistence=row` as configuration parameter, then `NGSICKANSink` [Top](#top) -####Column-like storing +#### Column-like storing If `attr_persistence=colum` then `NGSICKANSink` will persist the data within the body as: $ curl -s -S -H "Authorization: myapikey" "http://130.206.83.8:80/api/3/action/datastore_search?resource_id=611417a4-8196-4faf-83bc-663c173f6986" @@ -314,8 +314,8 @@ NOTE: `curl` is a Unix command allowing for interacting with REST APIs such as t [Top](#top) -##Administration guide -###Configuration +## Administration guide +### Configuration `NGSICKANSink` is configured through the following parameters: | Parameter | Mandatory | Default value | Comments | @@ -373,13 +373,13 @@ A configuration example could be: [Top](#top) -###Use cases +### Use cases Use `NGSICKANSink` if you are looking for a database storage not growing so much in the mid-long term. [Top](#top) -###Important notes -####About the persistence mode +### Important notes +#### About the persistence mode Please observe not always the same number of attributes is notified; this depends on the subscription made to the NGSI-like sender. This is not a problem for the `row` persistence mode, since fixed 8-fields rows are upserted for each notified attribute. Nevertheless, the `column` mode may be affected by several rows of different lengths (in term of fields). Thus, the `column` mode is only recommended if your subscription is designed for always sending the same attributes, event if they were not updated since the last notification. In addition, when running in `column` mode, due to the number of notified attributes (and therefore the number of fields to be written within the Datastore) is unknown by Cygnus, the Datastore cannot be automatically created, and must be provisioned previously to the Cygnus execution. That's not the case of the `row` mode since the number of fields to be written is always constant, independently of the number of notified attributes. @@ -388,7 +388,7 @@ Please check the [Annexes](#section4) in order to know how to provision a resour [Top](#top) -####About batching +#### About batching As explained in the [programmers guide](#section3), `NGSICKANSink` extends `NGSISink`, which provides a built-in mechanism for collecting events from the internal Flume channel. This mechanism allows extending classes have only to deal with the persistence details of such a batch of events in the final backend. What is important regarding the batch mechanism is it largely increases the performance of the sink, because the number of writes is dramatically reduced. Let's see an example, let's assume a batch of 100 `NGSIEvent`s. In the best case, all these events regard to the same entity, which means all the data within them will be persisted in the same CKAN resource. If processing the events one by one, we would need 100 inserts into CKAN; nevertheless, in this example only one insert is required. Obviously, not all the events will always regard to the same unique entity, and many entities may be involved within a batch. But that's not a problem, since several sub-batches of events are created within a batch, one sub-batch per final destination CKAN resource. In the worst case, the whole 100 entities will be about 100 different entities (100 different CKAN resources), but that will not be the usual scenario. Thus, assuming a realistic number of 10-15 sub-batches per batch, we are replacing the 100 inserts of the event by event approach with only 10-15 inserts. @@ -401,7 +401,7 @@ By default, `NGSICKANSink` has a configured batch size and batch accumulation ti [Top](#top) -####About the encoding +#### About the encoding Until version 1.2.0 (included), Cygnus applied a very simple encoding: * All non alphanumeric characters were replaced by underscore, `_`. @@ -424,7 +424,7 @@ Despite the old encoding will be deprecated in the future, it is possible to swi [Top](#top) -####About geolocation attributes +#### About geolocation attributes CKAN supports several [viewers](http://docs.ckan.org/en/latest/maintaining/data-viewer.html), among them we can find the `recline_map_viewer`. This is a typical 2D map where geolocation data can be rendered. Geolocation data in CKAN can be add in two ways: @@ -441,7 +441,7 @@ Finally, it must be said this way of mapping geolocated context information into [Top](#top) -####About capping resources and expirating records +#### About capping resources and expirating records Capping and expiration are disabled by default. Nevertheless, if desired, this can be enabled: * Capping by the number of records. This allows the resource growing up until certain configured maximum number of records is reached (`persistence_policy.max_records`), and then maintains a such a constant number of records. @@ -449,8 +449,8 @@ Capping and expiration are disabled by default. Nevertheless, if desired, this c [Top](#top) -##Programmers guide -###`NGSICKANSink` class +## Programmers guide +### `NGSICKANSink` class As any other NGSI-like sink, `NGSICKANSink` extends the base `NGSISink`. The methods that are extended are: void persistBatch(NGSIBatch batch) throws Exception; @@ -475,9 +475,9 @@ A complete configuration as the described above is read from the given `Context` [Top](#top) -##Annexes +## Annexes -###Provisioning a CKAN resource for the column mode +### Provisioning a CKAN resource for the column mode This section is built upon the assumption you are familiar with the CKAN API. If not, please have a look on [it](http://docs.ckan.org/en/latest/api/). First of all, you'll need a CKAN organization and package/dataset before creating a resource and an associated datastore in order to persist the data. diff --git a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_dynamodb_sink.md b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_dynamodb_sink.md index 9c07d75ba..28e0d921c 100644 --- a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_dynamodb_sink.md +++ b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_dynamodb_sink.md @@ -1,4 +1,4 @@ -#NGSIDynamoDBSink +# NGSIDynamoDBSink Content: * [Functionality](#section1) @@ -26,7 +26,7 @@ Content: * [`NGSIDynamoDBSink` class](#section3.1) * [Authentication and authorization](#section3.2) -##Functionality +## Functionality `com.iot.telefonica.cygnus.sinks.NGSIDynamoDBSink`, or simply `NGSIDynamoDBSink` is a sink designed to persist NGSI-like context data events within a [DynamoDB database](https://aws.amazon.com/dynamodb/) in [Amazon Web Services](https://aws.amazon.com/). Usually, such a context data is notified by a [Orion Context Broker](https://github.com/telefonicaid/fiware-orion) instance, but could be any other system speaking the NGSI language. Independently of the data generator, NGSI context data is always transformed into internal `NGSIEvent` objets at Cygnus sources. In the end, the information within these events must be mapped into specific DynamoDB data structures. @@ -35,19 +35,19 @@ Next sections will explain this in detail. [Top](#top) -###Mapping NGSI events to `NGSIEvent` objects +### Mapping NGSI events to `NGSIEvent` objects Notified NGSI events (containing context data) are transformed into `NGSIEvent` objects (for each context element a `NGSIEvent` is created; such an event is a mix of certain headers and a `ContextElement` object), independently of the NGSI data generator or the final backend where it is persisted. This is done at the cygnus-ngsi Http listeners (in Flume jergon, sources) thanks to [`NGSIRestHandler`](/ngsi_rest_handler.md). Once translated, the data (now, as `NGSIEvent` objects) is put into the internal channels for future consumption (see next section). [Top](#top) -###Mapping `NGSIEvent`s to DynamoDB data structures +### Mapping `NGSIEvent`s to DynamoDB data structures DynamoDB organizes the data in tables of data items. All the tables are located within the same *default database*, i.e. the Amazon Web Services user space. Such organization is exploited by `NGSIDynamoDBSink` each time a `NGSIEvent` is going to be persisted. [Top](#top) -####DynamoDB databases naming conventions +#### DynamoDB databases naming conventions As said, there is a DynamoDB database per Amazon user. The [name of these users](http://docs.aws.amazon.com/IAM/latest/UserGuide/reference_iam-limits.html) must be alphanumeric, including the following common characters: `+`, `=`, `,`, `.`, `@`, `_` and `-`. This leads to certain [encoding](#section2.3.5) is applied. DynamoDB [databases name length](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html#limits-naming-rules) may be up to 255 characters (minimum, 3 characters). @@ -56,7 +56,7 @@ Current version of the sink does not support multitenancy, that means only an Am [Top](#top) -####DynamoDB tables naming conventions +#### DynamoDB tables naming conventions The name of these tables depends on the configured data model (see the [Configuration](#section2.1) section for more details): * Data model by service path (`data_model=dm-by-service-path`). As the data model name denotes, the notified FIWARE service path (or the configured one as default in [`NGSIRestHandler`](/ngsi_rest_handler.md)) is used as the name of the table. This allows the data about all the NGSI entities belonging to the same service path is stored in this unique table. The only constraint regarding this data model is the FIWARE service path cannot be the root one (`/`). @@ -79,7 +79,7 @@ Please observe the concatenation of entity ID and type is already given in the ` [Top](#top) -####Row-like storing +#### Row-like storing Regarding the specific data stored within the datastore associated to the resource, if `attr_persistence` parameter is set to `row` (default storing mode) then the notified data is stored attribute by attribute, composing an insert for each one of them. Each insert contains the following fields: * `recvTimeTs`: UTC timestamp expressed in miliseconds. @@ -94,7 +94,7 @@ Regarding the specific data stored within the datastore associated to the resour [Top](#top) -####Column-like storing +#### Column-like storing Regarding the specific data stored within the datastore associated to the resource, if `attr_persistence` parameter is set to `column` then a single line is composed for the whole notified entity, containing the following fields: * `recvTime`: UTC timestamp in human-redable format ([ISO 8601](http://en.wikipedia.org/wiki/ISO_8601)). @@ -106,8 +106,8 @@ Regarding the specific data stored within the datastore associated to the resour [Top](#top) -###Example -####`NGSIEvent` +### Example +#### `NGSIEvent` Assuming the following `NGSIEvent` is created from a notified NGSI context data (the code below is an object representation, not any real data format): ngsi-event={ @@ -141,7 +141,7 @@ Assuming the following `NGSIEvent` is created from a notified NGSI context data [Top](#top) -####Table names +#### Table names The DynamoDB table names will be, depending on the configured data model, the following ones: | FIWARE service path | `dm-by-service-path` | `dm-by-entity` | @@ -151,22 +151,22 @@ The DynamoDB table names will be, depending on the configured data model, the fo [Top](#top) -####Raw-based storing +#### Raw-based storing Let's assume a table name `x002fvehiclesxffff4wheelsxffffcar1xffffcar` (data model by entity, non-root service path) and `attr_persistence=row` as configuration parameter. The data stored within this table would be: ![](../images/dynamodb_row_destination.jpg) [Top](#top) -####Column-based storing +#### Column-based storing If `attr_persistence=colum` then `NGSIDynamoDBSink` will persist the data within the body as: ![](../images/dynamodb_column_destination.jpg) [Top](#top) -##Administrator guide -###Configuration +## Administrator guide +### Configuration `NGSIDynamoDBSink` is configured through the following parameters: | Parameter | Mandatory | Default value | Comments | @@ -208,13 +208,13 @@ A configuration example could be: [Top](#top) -###Use cases +### Use cases Use `NGSIDynamoDBSink` if you are looking for a cloud-based database with [relatively good throughput](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ProvisionedThroughputIntro.html) and scalable storage. [Top](#top) -###Important notes -####About the table type and its relation with the grouping rules +### Important notes +#### About the table type and its relation with the grouping rules The table type configuration parameter, as seen, is a method for direct aggregation of data: by default destination (i.e. all the notifications about the same entity will be stored within the same DynamoDB table) or by default service-path (i.e. all the notifications about the same service-path will be stored within the same DynamoDB table). The [Grouping feature](/ngsi_grouping_interceptor.md) is another aggregation mechanism, but an inderect one. This means the grouping feature does not really aggregates the data into a single table, that's something the sink will done based on the configured table type (see above), but modifies the default destination or service-path, causing the data is finally aggregated (or not) depending on the table type. @@ -223,12 +223,12 @@ For instance, if the chosen table type is by destination and the grouping featur [Top](#top) -####About the persistence mode +#### About the persistence mode Please observe not always the same number of attributes is notified; this depends on the subscription made to the NGSI-like sender. This is not a problem for DynamoDB since this kind of database is designed for holding items of different length within the same table. Anyway, it must be taken into account, when designing your applications, the `row` persistence mode will always insert fixed 8-fields data items for each notified attribute. And the `column` mode may be affected by several data items of different lengths (in term of fields), as already explained. [Top](#top) -####About batching +#### About batching As explained in the [programmers guide](#section3), `NGSIDynamoDBSink` extends `NGSISink`, which provides a built-in mechanism for collecting events from the internal Flume channel. This mechanism allows extending classes have only to deal with the persistence details of such a batch of events in the final backend. What is important regarding the batch mechanism is it largely increases the performance of the sink, because the number of inserts is dramatically reduced. Let's see an example, let's assume a batch of 100 `NGSIEvent`s. In the best case, all these events regard to the same entity, which means all the data within them will be persisted in the same DynamoDB table. If processing the events one by one, we would need 100 inserts into DynamoDB; nevertheless, in this example only one insert is required. Obviously, not all the events will always regard to the same unique entity, and many entities may be involved within a batch. But that's not a problem, since several sub-batches of events are created within a batch, one sub-batch per final destination DynamoDB table. In the worst case, the whole 100 entities will be about 100 different entities (100 different DynamoDB tables), but that will not be the usual scenario. Thus, assuming a realistic number of 10-15 sub-batches per batch, we are replacing the 100 inserts of the event by event approach with only 10-15 inserts. @@ -241,7 +241,7 @@ By default, `NGSIDynamoDBSink` has a configured batch size and batch accumulatio [Top](#top) -####Throughput in DynamoDB +#### Throughput in DynamoDB Please observe DynamoDB is a cloud-based storage whose throughput may be seriously affected by how far are the region the tables are going to be created and the amount of information per write. Regarding the region, always choose the closest one to the host running Cygnus and `NGSIDynamoDBSink`. @@ -250,7 +250,7 @@ Regarding the amount of information per write, please read carefully [this](http [Top](#top) -####About the encoding +#### About the encoding Cygnus applies this specific encoding tailored to DynamoDB data structures: * Alphanumeric characters are not encoded. @@ -265,8 +265,8 @@ Cygnus applies this specific encoding tailored to DynamoDB data structures: [Top](#top) -##Programmers guide -###`NGSIDynamoDBSink` class +## Programmers guide +### `NGSIDynamoDBSink` class As any other NGSI-like sink, `NGSIDynamoDBSink` extends the base `NGSISink`. The methods that are extended are: void persistBatch(Batch batch) throws Exception; @@ -283,7 +283,7 @@ A complete configuration as the described above is read from the given `Context` [Top](#top) -###Authentication and authorization +### Authentication and authorization Current implementation of `NGSIDynamoDBSink` relies on the [AWS access keys](http://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html) mechanism. [Top](#top) diff --git a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_grouping_interceptor.md b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_grouping_interceptor.md index f7a82a749..e04fcefe4 100644 --- a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_grouping_interceptor.md +++ b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_grouping_interceptor.md @@ -1,4 +1,4 @@ -#NGSIGroupingInterceptor +# NGSIGroupingInterceptor **IMPORTANT NOTE: from release 1.6.0, this feature is deprecated in favour of Name Mappings. More details can be found [here](./deprecated_and_removed.md#section2.1).** Content: @@ -11,7 +11,7 @@ Content: * [Configuration](#section2.1) * [Management Interface related operations](#section2.2) -##Functionality +## Functionality This is a custom Interceptor specifically designed for Cygnus. Its purpose is to alter an original `NGSIEvent` object (which comes from a NGSI notification handled by [`NGSIRestHandler`](./ngsi_rest_handler.md)) by inferring the destination entity where the data regarding a notified entity is going to be persisted. This destination entity, depending on the used sinks, may be a HDFS file name, a MySQL table name or a CKAN resource name. In addition, a new `fiware-servicePath` containing the destination entity may be configured; for instance, in case of HDFS, this is a folder; in case of CKAN this is a package; in case of MySQL this is simply a prefix for the table name. Such an inference is made by inspecting (but not modifying) certain configured fields within the `ContextElement` object of the `NGSIEvent`; if the concatenation of such fields matches a configured regular expression, then: @@ -25,7 +25,7 @@ This way, those sinks having enabled the grouping rules will use both the `group [Top](#top) -###Grouping rules syntax +### Grouping rules syntax There exists a file containing Json-like rules definition, following this format: { @@ -57,7 +57,7 @@ Regarding the syntax of the rules, all the fields are mandatory and must have a [Top](#top) -###Headers before and after intercepting +### Headers before and after intercepting Before interception, these are the headers added by the [NGSIRestHandler](./ngsi_rest_handler.md) to all the internal Flume events: * `fiware-service`. FIWARE service which the entity related to the notified data belongs to. @@ -75,7 +75,7 @@ Other interceptors may add further headers, such as the `timestamp` header added [Top](#top) -###Example +### Example Let's assume these rules: { @@ -224,8 +224,8 @@ intercepted-ngsi-event-2={ [Top](#top) -##Administration guide -###Configuration +## Administration guide +### Configuration `NGSIGroupingInterceptor` is configured through the following parameters: | Parameter | Mandatory | Default value | Comments | @@ -242,7 +242,7 @@ A configuration example could be: [Top](#top) -###Management Interface related operations +### Management Interface related operations The Management Interface of Cygnus exposes a set of operations under the `/v1/groupingrules` path related to the grouping rules feature, allowing listing/updating/removing the rules. For instance: diff --git a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_hdfs_sink.md b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_hdfs_sink.md index e13e221f9..512cb3196 100644 --- a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_hdfs_sink.md +++ b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_hdfs_sink.md @@ -1,4 +1,4 @@ -#NGSIHDFSSink +# NGSIHDFSSink Content: * [Functionality](#section1) @@ -31,7 +31,7 @@ Content: * [OAuth2 authentication](#section3.2) * [Kerberos authentication](#section3.3) -##Functionality +## Functionality `com.iot.telefonica.cygnus.sinks.NGSIHDFSSink`, or simply `NGSIHDFSSink` is a sink designed to persist NGSI-like context data events within a [HDFS](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html) deployment. Usually, such a context data is notified by a [Orion Context Broker](https://github.com/telefonicaid/fiware-orion) instance, but could be any other system speaking the NGSI language. Independently of the data generator, NGSI context data is always transformed into internal `NGSIEvent` objects at Cygnus sources. In the end, the information within these events must be mapped into specific HDFS data structures at the Cygnus sinks. @@ -40,19 +40,19 @@ Next sections will explain this in detail. [Top](#top) -###Mapping NGSI events to `NGSIEvent` objects +### Mapping NGSI events to `NGSIEvent` objects Notified NGSI events (containing context data) are transformed into `NGSIEvent` objects (for each context element a `NGSIEvent` is created; such an event is a mix of certain headers and a `ContextElement` object), independently of the NGSI data generator or the final backend where it is persisted. This is done at the cygnus-ngsi Http listeners (in Flume jergon, sources) thanks to [`NGSIRestHandler`](/ngsi_rest_handler.md). Once translated, the data (now, as `NGSIEvent` objects) is put into the internal channels for future consumption (see next section). [Top](#top) -###Mapping `NGSIEvent`s to HDFS data structures +### Mapping `NGSIEvent`s to HDFS data structures [HDFS organizes](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#The_File_System_Namespace) the data in folders containing big data files. Such organization is exploited by `NGSIHDFSSink` each time a `NGSIEvent` is going to be persisted. [Top](#top) -####HDFS paths naming conventions +#### HDFS paths naming conventions Since the unique data model accepted for `NGSIHDFSSink` is per entity (see the [Configuration](#section2.1) section for more details), a HDFS folder: /user//// @@ -65,7 +65,7 @@ Please observe HDFS folders and files follow the [Unix rules](https://en.wikiped [Top](#top) -####Json row-like storing +#### Json row-like storing Regarding the specific data stored within the HDFS file, if `file_format` parameter is set to `json-row` (default storing mode) then the notified data is stored attribute by attribute, composing a Json document for each one of them. Each append contains the following fields: * `recvTimeTs`: UTC timestamp expressed in miliseconds. @@ -80,7 +80,7 @@ Regarding the specific data stored within the HDFS file, if `file_format` parame [Top](#top) -####Json column-like storing +#### Json column-like storing Regarding the specific data stored within the HDFS file, if `file_format` parameter is set to `json-column` then a single Json document is composed for the whole notified entity, containing the following fields: * `recvTime`: UTC timestamp in human-readable format ([ISO 8601](http://en.wikipedia.org/wiki/ISO_8601)). @@ -92,7 +92,7 @@ Regarding the specific data stored within the HDFS file, if `file_format` parame [Top](#top) -####CSV row-like storing +#### CSV row-like storing Regarding the specific data stored within the HDFS file, if `file_format` parameter is set to `csv-row` then the notified data is stored attribute by attribute, composing a CSV record for each one of them. Each record contains the following fields: * `recvTimeTs`: UTC timestamp expressed in miliseconds. @@ -107,7 +107,7 @@ Regarding the specific data stored within the HDFS file, if `file_format` parame [Top](#top) -####CSV column-like storing +#### CSV column-like storing Regarding the specific data stored within the HDFS file, if `file_format` parameter is set to `csv-column` then a single CSV record is composed for the whole notified entity, containing the following fields: * `recvTime`: UTC timestamp in human-readable format ([ISO 8601](http://en.wikipedia.org/wiki/ISO_8601)). @@ -119,15 +119,15 @@ Regarding the specific data stored within the HDFS file, if `file_format` parame [Top](#top) -####Hive +#### Hive A special feature regarding HDFS persisted data is the possibility to exploit it through Hive, a SQL-like querying system. `NGSIHDFSSink` automatically [creates a Hive external table](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/TruncateTable) (similar to a SQL table) for each persisted entity in the default database, being the name for such tables as `____[row|column]`. The fields regarding each data row match the fields of the JSON documents/CSV records appended to the HDFS files. In the case of JSON, they are deserialized by using a [JSON serde](https://github.com/rcongiu/Hive-JSON-Serde). In the case of CSV they are deserialized by the delimiter fields specified in the table creation. [Top](#top) -###Example -####`NGSIEvent` +### Example +#### `NGSIEvent` Assuming the following `NGSIEvent` is created from a notified NGSI context data (the code below is an object representation, not any real data format): ngsi-event={ @@ -161,7 +161,7 @@ Assuming the following `NGSIEvent` is created from a notified NGSI context data [Top](#top) -####Path names +#### Path names Assuming `hdfs_username=myuser` and `service_as_namespace=false` as configuration parameters, then `NGSIHDFSSink` will persist the data within the body in this file (old encoding): $ hadoop fs -cat /user/myuser/vehicles/4wheels/car1_car/car1_car.txt @@ -172,7 +172,7 @@ Using the new encoding: [Top](#top) -####Json row-like storing +#### Json row-like storing A pair of Json documents are appended to the above file, one per attribute: ``` @@ -182,7 +182,7 @@ A pair of Json documents are appended to the above file, one per attribute: [Top](#top) -####Json column-like storing +#### Json column-like storing A single Json document is appended to the above file, containing all the attributes: ``` @@ -191,7 +191,7 @@ A single Json document is appended to the above file, containing all the attribu [Top](#top) -####CSV row-like storing +#### CSV row-like storing A pair of CSV records are appended to the above file, one per attribute: ``` @@ -215,7 +215,7 @@ then the `hdfs:///user/myuser/vehicles/4wheels/car1_car_speed_float/car1_car_spe [Top](#top) -####CSV column-like storing +#### CSV column-like storing A single CSV record is appended to the above file, containing all the attributes: ``` @@ -238,7 +238,7 @@ then the `hdfs:///user/myuser/vehicles/4wheels/car1_car_speed_float/car1_car_spe [Top](#top) -####Hive storing +#### Hive storing With respect to Hive, the content of the tables in the `json-row`, `json-column`, `csv-row` and `csv-column` modes, respectively, is: $ hive @@ -259,8 +259,8 @@ NOTE: `hive` is the Hive CLI for locally querying the data. [Top](#top) -##Administration guide -###Configuration +## Administration guide +### Configuration `NGSIHDFSSink` is configured through the following parameters: | Parameter | Mandatory | Default value | Comments | @@ -329,21 +329,20 @@ A configuration example could be: [Top](#top) -###Use cases +### Use cases Use `NGSIHDFSSink` if you are looking for a JSON or CSV-based document storage growing in the mid-long-term in estimated sizes of terabytes for future trending discovery, along the time persistent patterns of behaviour and so on. For a short-term historic, those required by dashboards and charting user interfaces, other backends are more suited such as MongoDB, STH Comet or MySQL (Cygnus provides sinks for them, as well). [Top](#top) -###Important notes - -####About the persistence mode +### Important notes +#### About the persistence mode Please observe not always the same number of attributes is notified; this depends on the subscription made to the NGSI-like sender. This is not a problem for the `*-row` persistence mode, since fixed 8-fields JSON/CSV documents are appended for each notified attribute. Nevertheless, the `*-column` mode may be affected by several JSON documents/CSV records of different lengths (in term of fields). Thus, the `*-column` mode is only recommended if your subscription is designed for always sending the same attributes, event if they were not updated since the last notification. [Top](#top) -####About the binary backend +#### About the binary backend Current implementation of the HDFS binary backend does not support any authentication mechanism. A desirable authentication method would be OAuth2, since it is the standard in FIWARE, but this is not currently supported by the remote RPC server the binary backend accesses. @@ -356,7 +355,7 @@ There exists an [issue](https://github.com/telefonicaid/fiware-cosmos/issues/111 [Top](#top) -####About batching +#### About batching As explained in the [programmers guide](#section3), `NGSIHDFSSink` extends `NGSISink`, which provides a built-in mechanism for collecting events from the internal Flume channel. This mechanism allows extending classes have only to deal with the persistence details of such a batch of events in the final backend. What is important regarding the batch mechanism is it largely increases the performance of the sink, because the number of writes is dramatically reduced. Let's see an example, let's assume a batch of 100 `NGSIEvent`s. In the best case, all these events regard to the same entity, which means all the data within them will be persisted in the same HDFS file. If processing the events one by one, we would need 100 writes to HDFS; nevertheless, in this example only one write is required. Obviously, not all the events will always regard to the same unique entity, and many entities may be involved within a batch. But that's not a problem, since several sub-batches of events are created within a batch, one sub-batch per final destination HDFS file. In the worst case, the whole 100 entities will be about 100 different entities (100 different HDFS destinations), but that will not be the usual scenario. Thus, assuming a realistic number of 10-15 sub-batches per batch, we are replacing the 100 writes of the event by event approach with only 10-15 writes. @@ -369,7 +368,7 @@ By default, `NGSIHDFSSink` has a configured batch size and batch accumulation ti [Top](#top) -####About the encoding +#### About the encoding Until version 1.2.0 (included), Cygnus applied a very simple encoding: * All non alphanumeric characters were replaced by underscore, `_`. @@ -390,8 +389,8 @@ Despite the old encoding will be deprecated in the future, it is possible to swi [Top](#top) -##Programmers guide -###`NGSIHDFSSink` class +## Programmers guide +### `NGSIHDFSSink` class As any other NGSI-like sink, `NGSIHDFSSink` extends the base `NGSISink`. The methods that are extended are: void persistBatch(Batch batch) throws Exception; @@ -408,7 +407,7 @@ A complete configuration as the described above is read from the given `Context` [Top](#top) -###OAuth2 authentication +### OAuth2 authentication [OAuth2](http://oauth.net/2/) is the evolution of the OAuth protocol, an open standard for authorization. Using OAuth, client applications can access in a secure way certain server resources on behalf of the resource owner, and the best, without sharing their credentials with the service. This works because of a trusted authorization service in charge of emitting some pieces of security information: the access tokens. Once requested, the access token is attached to the service request in order the server may ask the authorization service for the validity of the user requesting the access (authentication) and the availability of the resource itself for this user (authorization). A detailed architecture of OAuth2 can be found [here](http://forge.fiware.org/plugins/mediawiki/wiki/fiware/index.php/PEP_Proxy_-_Wilma_-_Installation_and_Administration_Guide), but in a nutshell, FIWARE implements the above concept through the Identity Manager GE ([Keyrock](http://catalogue.fiware.org/enablers/identity-management-keyrock) implementation) and the Access Control ([AuthZForce](http://catalogue.fiware.org/enablers/authorization-pdp-authzforce) implementation); the join of this two enablers conform the OAuth2-based authorization service in FIWARE: @@ -427,7 +426,7 @@ As you can see, your FIWARE Lab credentials are required in the payload, in the [Top](#top) -###Kerberos authentication +### Kerberos authentication Hadoop Distributed File System (HDFS) can be remotely managed through a REST API called [WebHDFS](http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html). This API may be used without any kind of security (in this case, it is enough knowing a valid HDFS user name in order to access this user HDFS space), or a Kerberos infrastructure may be used for authenticating the users. [Kerberos](http://web.mit.edu/kerberos/) is an authentication protocol created by MIT, current version is 5. It is based in symmetric key cryptography and a trusted third party, the Kerberos servers themselves. The protocol is as easy as authenticating to the Authentication Server (AS), which forwards the user to the Key Distribution Center (KDC) with a ticket-granting ticket (TGT) that can be used to retrieve the definitive client-to-server ticket. This ticket can then be used for authentication purposes against a service server (in both directions). @@ -444,7 +443,7 @@ Nevertheless, Cygnus needs this process to be automated. Let's see how through t [Top](#top) -####`conf/cygnus.conf` +#### `conf/cygnus.conf` This file can be built from the distributed `conf/cygnus.conf.template`. Edit appropriately this part of the `NGSIHDFSSink` configuration: # Kerberos-based authentication enabling @@ -462,7 +461,7 @@ I.e. start enabling (or not) the Kerberos authentication. Then, configure a user [Top](#top) -####`conf/krb5_login.conf` +#### `conf/krb5_login.conf` Contains the following line, which must not be changed (thus, the distributed file is not a template but the definitive one). @@ -472,7 +471,7 @@ Contains the following line, which must not be changed (thus, the distributed fi [Top](#top) -####`conf/krb5.conf` +#### `conf/krb5.conf` This file can be built from the distributed `conf/krb5.conf.template`. Edit it appropriately, basically by replacing `EXAMPLE.COM` by your Kerberos realm (this is the same than your domain, but uppercase, i.e. the realm for `example.com` is `EXAMPLE.COM`) and by configuring your Kerberos Key Distribution Center (KDC) and your Kerberos admin/authentication server (ask your network administrator in order to know them). diff --git a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_kafka_sink.md b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_kafka_sink.md index 4361c9f13..5866bacd1 100644 --- a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_kafka_sink.md +++ b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_kafka_sink.md @@ -1,4 +1,4 @@ -#NGSIKafkaSink +# NGSIKafkaSink Content: * [Functionality](#section1) @@ -19,7 +19,7 @@ Content: * [Programmers guide](#section3) * [`NGSIKafkaSink` class](#section3.1) -##Functionality +## Functionality `com.iot.telefonica.cygnus.sinks.NGSIKafkaSink`, or simply `NGSIKafkaSink` is a sink designed to persist NGSI-like context data events within a [Apache Kafka](http://kafka.apache.org/) deployment. Usually, such a context data is notified by a [Orion Context Broker](https://github.com/telefonicaid/fiware-orion) instance, but could be any other system speaking the NGSI language. Independently of the data generator, NGSI context data is always transformed into internal `NGSIEvent` objects at Cygnus sources. In the end, the information within these events must be mapped into specific Kafka data structures at the Cygnus sinks. @@ -28,19 +28,19 @@ Next sections will explain this in detail. [Top](#top) -###Mapping NGSI events to `NGSIEvent` objects +### Mapping NGSI events to `NGSIEvent` objects Notified NGSI events (containing context data) are transformed into `NGSIEvent` objects (for each context element a `NGSIEvent` is created; such an event is a mix of certain headers and a `ContextElement` object), independently of the NGSI data generator or the final backend where it is persisted. This is done at the cygnus-ngsi Http listeners (in Flume jergon, sources) thanks to [`NGSIRestHandler`](/ngsi_rest_handler.md). Once translated, the data (now, as `NGSIEvent` objects) is put into the internal channels for future consumption (see next section). [Top](#top) -###Mapping `NGSIEvent`s to Kafka data structures +### Mapping `NGSIEvent`s to Kafka data structures [Apache Kafka organizes](http://kafka.apache.org/documentation.html#introduction) the data in topics (a category or feed name to which messages are published). Such organization is exploited by `NGSIKafkaSink` each time a `NGSIEvent` is going to be persisted. [Top](#top) -####Topics naming conventions +#### Topics naming conventions A Kafka topic is created (number of partitions 1) if not yet existing depending on the configured data model: * Data model by service (`data_model=dm-by-service`). As the data model name denotes, the notified FIWARE service (or the configured one as default in [`NGSIRestHandler`](ngsi_rest_handler.md)) is used as the name of the topic. This allows the data about all the NGSI entities belonging to the same service is stored in this unique topic. @@ -61,13 +61,13 @@ Please observe the concatenation of entity ID and type is already given in the ` [Top](#top) -####Storing +#### Storing `NGSIEvent`s structure is stringified as a Json object containing an array of headers and another object containing the Json data as it is notified by the NGSI-like source. [Top](#top) -###Example -####`NGSIEvent` +### Example +#### `NGSIEvent` Assuming the following `NGSIEvent` is created from a notified NGSI context data (the code below is an object representation, not any real data format): ngsi-event={ @@ -101,7 +101,7 @@ Assuming the following `NGSIEvent` is created from a notified NGSI context data [Top](#top) -####Topic names +#### Topic names The topic names will be, depending on the configured data model, the following ones: | FIWARE service path | `dm-by-service` | `dm-by-service-path` | `dm-by-entity` | `dm-by-attribute` | @@ -111,7 +111,7 @@ The topic names will be, depending on the configured data model, the following o [Top](#top) -####Storing +#### Storing Let's assume a topic name `vehiclesxffffx002f4wheelsxffffcar1xffffcarxffffspeed` (data model by attribute, non-root service path). The data stored within this topic would be: $ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic vehiclesxffffx002f4wheelsxffffcar1xffffcarxffffspeed --from-beginning @@ -121,8 +121,8 @@ Let's assume a topic name `vehiclesxffffx002f4wheelsxffffcar1xffffcarxffffspeed` [Top](#top) -##Administration guide -###Configuration +## Administration guide +### Configuration `NGSIKafkaSink` is configured through the following parameters: | Parameter | Mandatory | Default value | Comments | @@ -164,13 +164,13 @@ A configuration example could be: [Top](#top) -###Use cases +### Use cases Use `NGSIKafkaSink` if you want to integrate OrionContextBroker with a Kafka-based consumer, as a Storm real-time application. [Top](#top) -###Important notes -####About batching +### Important notes +#### About batching As explained in the [programmers guide](#section3), `NGSIKafkaSink` extends `NGSISink`, which provides a built-in mechanism for collecting events from the internal Flume channel. This mechanism allows extending classes have only to deal with the persistence details of such a batch of events in the final backend. What is important regarding the batch mechanism is it largely increases the performance of the sink, because the number of writes is dramatically reduced. Let's see an example, let's assume a batch of 100 `NGSIEvent`s. In the best case, all these events regard to the same entity, which means all the data within them will be persisted in the same Kafka topic. If processing the events one by one, we would need 100 writes to Kafka; nevertheless, in this example only one write is required. Obviously, not all the events will always regard to the same unique entity, and many entities may be involved within a batch. But that's not a problem, since several sub-batches of events are created within a batch, one sub-batch per final destination Kafka topic. In the worst case, the whole 100 entities will be about 100 different entities (100 different Kafka topics), but that will not be the usual scenario. Thus, assuming a realistic number of 10-15 sub-batches per batch, we are replacing the 100 writes of the event by event approach with only 10-15 writes. @@ -183,7 +183,7 @@ By default, `NGSIKafkaSink` has a configured batch size and batch accumulation t [Top](#top) -####About the encoding +#### About the encoding Cygnus applies this specific encoding tailored to Kafka data structures: * Alphanumeric characters are not encoded. @@ -198,8 +198,8 @@ Cygnus applies this specific encoding tailored to Kafka data structures: [Top](#top) -##Programmers guide -###`NGSIKafkaSink` class +## Programmers guide +### `NGSIKafkaSink` class As any other NGSI-like sink, `NGSIKafkaSink` extends the base `NGSISink`. The methods that are extended are: void persistBatch(Batch batch) throws Exception; diff --git a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_mongo_sink.md b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_mongo_sink.md index 9d0fd4d8c..384fa4fa3 100644 --- a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_mongo_sink.md +++ b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_mongo_sink.md @@ -1,4 +1,4 @@ -#NGSIMongoSink +# NGSIMongoSink Content: * [Functionality](#section1) @@ -26,7 +26,7 @@ Content: * [`NGSIMongoBackend` class](#section3.2) * [Authentication and authorization](#section3.3) -##Functionality +## Functionality `com.iot.telefonica.cygnus.sinks.NGSIMongoSink`, or simply `NGSIMongoSink` is a sink designed to persist NGSI-like context data events within a MongoDB server. Usually, such a context data is notified by a [Orion Context Broker](https://github.com/telefonicaid/fiware-orion) instance, but could be any other system speaking the NGSI language. Independently of the data generator, NGSI context data is always transformed into internal `NGSIEvent` objects at Cygnus sources. In the end, the information within these events must be mapped into specific HDFS data structures at the Cygnus sinks. @@ -35,19 +35,19 @@ Next sections will explain this in detail. [Top](#top) -###Mapping NGSI events to `NGSIEvent` objects +### Mapping NGSI events to `NGSIEvent` objects Notified NGSI events (containing context data) are transformed into `NGSIEvent` objects (for each context element a `NGSIEvent` is created; such an event is a mix of certain headers and a `ContextElement` object), independently of the NGSI data generator or the final backend where it is persisted. This is done at the cygnus-ngsi Http listeners (in Flume jergon, sources) thanks to [`NGSIRestHandler`](/ngsi_rest_handler.md). Once translated, the data (now, as `NGSIEvent` objects) is put into the internal channels for future consumption (see next section). [Top](#top) -###Mapping `NGSIEvent`s to MongoDB data structures +### Mapping `NGSIEvent`s to MongoDB data structures MongoDB organizes the data in databases that contain collections of Json documents. Such organization is exploited by `NGSIMongoSink` each time a `NGSIEvent` is going to be persisted. [Top](#top) -####MongoDB databases naming conventions +#### MongoDB databases naming conventions A database called as the `fiware-service` header value within the event is created (if not existing yet). A configured prefix is added (by default, `sth_`). It must be said [MongoDB does not accept](https://docs.mongodb.com/manual/reference/limits/#naming-restrictions) `/`, `\`, `.`, `"` and `$` in the database names. This leads to certain [encoding](#section2.3.3) is applied depending on the `enable_encoding` configuration parameter. @@ -56,7 +56,7 @@ MongoDB [namespaces (database + collection) name length](https://docs.mongodb.co [Top](#top) -####MongoDB collections naming conventions +#### MongoDB collections naming conventions The name of these collections depends on the configured data model and analysis mode (see the [Configuration](#section2.1) section for more details): * Data model by service path (`data_model=dm-by-service-path`). As the data model name denotes, the notified FIWARE service path (or the configured one as default in [`NGSIRestHandler`](./ngsi_rest_handler.md)) is used as the name of the collection. This allows the data about all the NGSI entities belonging to the same service path is stored in this unique table. The configured prefix is prepended to the collection name. @@ -85,7 +85,7 @@ Please observe the concatenation of entity ID and type is already given in the ` [Top](#top) -####Row-like storing +#### Row-like storing Regarding the specific data stored within the above collections, if `attr_persistence` parameter is set to `row` (default storing mode) then the notified data is stored attribute by attribute, composing a Json document for each one of them. Each document contains a variable number of fields, depending on the configured `data_model`: * Data model by service path: @@ -113,7 +113,7 @@ Regarding the specific data stored within the above collections, if `attr_persis [Top](#top) -####Column-like storing +#### Column-like storing Regarding the specific data stored within the above collections, if `attr_persistence` parameter is set to `column` then a single Json document is composed for the whole notified entity. Each document contains a variable number of fields, depending on the configured `data_model`: * Data model by service path: @@ -132,8 +132,8 @@ Regarding the specific data stored within the above collections, if `attr_persis [Top](#top) -###Example -####`NGSIEvent` +### Example +#### `NGSIEvent` Assuming the following `NGSIEvent` is created from a notified NGSI context data (the code below is an object representation, not any real data format): ngsi-event={ @@ -167,7 +167,7 @@ Assuming the following `NGSIEvent` is created from a notified NGSI context data [Top](#top) -####Database and collection names +#### Database and collection names A MongoDB database named as the concatenation of the prefix and the notified FIWARE service path, i.e. `sth_vehicles`, will be created. Regarding the collection names, the MongoDB collection names will be, depending on the configured data model, the following ones (old encoding): @@ -186,7 +186,7 @@ Using the new encoding: [Top](#top) -####Row-like storing +#### Row-like storing Assuming `data_model=dm-by-service-path` and `attr_persistence=row` as configuration parameters, then `NGSIMongoSink` will persist the data within the body as: $ mongo -u myuser -p @@ -248,7 +248,7 @@ If `data_model=dm-by-attribute` and `attr_persistence=row` then `NGSIMongoSink` [Top](#top) -####Column-like storing +#### Column-like storing If `data_model=dm-by-service-path` and `attr_persistence=column` then `NGSIMongoSink` will persist the data within the body as: $ mongo -u myuser -p @@ -287,8 +287,8 @@ If `data_model=dm-by-entity` and `attr_persistence=column` then `NGSIMongoSink` [Top](#top) -##Administration guide -###Configuration +## Administration guide +### Configuration `NGSIMongoSink` is configured through the following parameters: | Parameter | Mandatory | Default value | Comments | @@ -346,13 +346,13 @@ A configuration example could be: [Top](#top) -###Use cases +### Use cases Use `NGSIMongoSink` if you are looking for a Json-based document storage not growing so much in the mid-long term. [Top](#top) -###Important notes -####About batching +### Important notes +#### About batching As explained in the [programmers guide](#section3), `NGSIMongoSink` extends `NGSISink`, which provides a built-in mechanism for collecting events from the internal Flume channel. This mechanism allows extending classes have only to deal with the persistence details of such a batch of events in the final backend. What is important regarding the batch mechanism is it largely increases the performance of the sink, because the number of writes is dramatically reduced. Let's see an example, let's assume a batch of 100 `NGSIEvent`s. In the best case, all these events regard to the same entity, which means all the data within them will be persisted in the same MongoDB collection. If processing the events one by one, we would need 100 inserts into MongoDB; nevertheless, in this example only one insert is required. Obviously, not all the events will always regard to the same unique entity, and many entities may be involved within a batch. But that's not a problem, since several sub-batches of events are created within a batch, one sub-batch per final destination MongoDB collection. In the worst case, the whole 100 entities will be about 100 different entities (100 different MongoDB collections), but that will not be the usual scenario. Thus, assuming a realistic number of 10-15 sub-batches per batch, we are replacing the 100 inserts of the event by event approach with only 10-15 inserts. @@ -365,12 +365,12 @@ By default, `NGSIMongoSink` has a configured batch size and batch accumulation t [Top](#top) -####About `recvTime` and `TimeInstant` metadata +#### About `recvTime` and `TimeInstant` metadata By default, `NGSIMongoSink` stores the notification reception timestamp. Nevertheless, if (and only if) working in `row` mode and a metadata named `TimeInstant` is notified, then such metadata value is used instead of the reception timestamp. This is useful when wanting to persist a measure generation time (which is thus notified as a `TimeInstant` metadata) instead of the reception time. [Top](#top) -####About the encoding +#### About the encoding `NGSIMongoSink` follows the [MongoDB naming restrictions](https://docs.mongodb.org/manual/reference/limits/#naming-restrictions). In a nutshell: Until version 1.2.0 (included), Cygnus applied a very simple encoding: @@ -389,7 +389,7 @@ Despite the old encoding will be deprecated in the future, it is possible to swi [Top](#top) -####About supported versions of MongoDB +#### About supported versions of MongoDB This sink has been tested with the following versions of Mongo: * 3.2.6 @@ -397,8 +397,8 @@ This sink has been tested with the following versions of Mongo: [Top](#top) -##Programmers guide -###`NGSISTHSink` class +## Programmers guide +### `NGSISTHSink` class `NGSIMongoSink` extends `NGSIMongoBaseSink`, which as any other NGSI-like sink, extends the base `NGSISink`. The methods that are extended are: void persistBatch(Batch batch) throws Exception; @@ -415,7 +415,7 @@ A complete configuration as the described above is read from the given `Context` [Top](#top) -###`MongoBackend` class +### `MongoBackend` class This is a convenience backend class for MongoDB that provides methods to persist the context data both in raw of aggregated format. Relevant methods regarding raw format are: public void createDatabase(String dbName) throws Exception; @@ -434,7 +434,7 @@ Nothing special is done with regards to the encoding. Since Cygnus generally wor [Top](#top) -###Authentication and authorization +### Authentication and authorization Current implementation of `NGSIMongoSink` relies on the username and password credentials created at the MongoDB endpoint. [Top](#top) diff --git a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_mysql_sink.md b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_mysql_sink.md index 610518889..ebba4033c 100644 --- a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_mysql_sink.md +++ b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_mysql_sink.md @@ -1,4 +1,4 @@ -#NGSIMySQLSink +# NGSIMySQLSink Content: * [Functionality](#section1) @@ -27,7 +27,7 @@ Content: * [`NGSIMySQLSink` class](#section3.1) * [Authentication and authorization](#section3.2) -##Functionality +## Functionality `com.iot.telefonica.cygnus.sinks.NGSIMySQLSink`, or simply `NGSIMySQLSink` is a sink designed to persist NGSI-like context data events within a [MySQL server](https://www.mysql.com/). Usually, such a context data is notified by a [Orion Context Broker](https://github.com/telefonicaid/fiware-orion) instance, but could be any other system speaking the NGSI language. Independently of the data generator, NGSI context data is always transformed into internal `NGSIEvent` objects at Cygnus sources. In the end, the information within these events must be mapped into specific MySQL data structures. @@ -36,19 +36,19 @@ Next sections will explain this in detail. [Top](#top) -###Mapping NGSI events to `NGSIEvent` objects +### Mapping NGSI events to `NGSIEvent` objects Notified NGSI events (containing context data) are transformed into `NGSIEvent` objects (for each context element a `NGSIEvent` is created; such an event is a mix of certain headers and a `ContextElement` object), independently of the NGSI data generator or the final backend where it is persisted. This is done at the cygnus-ngsi Http listeners (in Flume jergon, sources) thanks to [`NGSIRestHandler`](/ngsi_rest_handler.md). Once translated, the data (now, as `NGSIEvent` objects) is put into the internal channels for future consumption (see next section). [Top](#top) -###Mapping `NGSIEvent`s to MySQL data structures +### Mapping `NGSIEvent`s to MySQL data structures MySQL organizes the data in databases that contain tables of data rows. Such organization is exploited by `NGSIMySQLSink` each time a `NGSIEvent` is going to be persisted. [Top](#top) -####MySQL databases naming conventions +#### MySQL databases naming conventions A database named as the notified `fiware-service` header value (or, in absence of such a header, the defaulted value for the FIWARE service) is created (if not existing yet). It must be said MySQL [only accepts](http://dev.mysql.com/doc/refman/5.7/en/identifiers.html) alphanumerics `$` and `_`. This leads to certain [encoding](#section2.3.3) is applied depending on the `enable_encoding` configuration parameter. @@ -57,7 +57,7 @@ MySQL [databases name length](http://dev.mysql.com/doc/refman/5.7/en/identifiers [Top](#top) -####MySQL tables naming conventions +#### MySQL tables naming conventions The name of these tables depends on the configured data model (see the [Configuration](#section2.1) section for more details): * Data model by service path (`data_model=dm-by-service-path`). As the data model name denotes, the notified FIWARE service path (or the configured one as default in [`NGSIRestHandler`](./ngsi_rest_handler.md) is used as the name of the table. This allows the data about all the NGSI entities belonging to the same service path is stored in this unique table. The only constraint regarding this data model is the FIWARE service path cannot be the root one (`/`). @@ -85,7 +85,7 @@ Please observe the concatenation of entity ID and type is already given in the ` [Top](#top) -####Row-like storing +#### Row-like storing Regarding the specific data stored within the above table, if `attr_persistence` parameter is set to `row` (default storing mode) then the notified data is stored attribute by attribute, composing an insert for each one of them. Each insert contains the following fields: * `recvTimeTs`: UTC timestamp expressed in miliseconds. @@ -100,7 +100,7 @@ Regarding the specific data stored within the above table, if `attr_persistence` [Top](#top) -####Column-like storing +#### Column-like storing Regarding the specific data stored within the above table, if `attr_persistence` parameter is set to `column` then a single line is composed for the whole notified entity, containing the following fields: * `recvTime`: Timestamp in human-readable format (Similar to [ISO 8601](http://en.wikipedia.org/wiki/ISO_8601), but avoiding the `Z` character denoting UTC, since all MySQL timestamps are supposed to be in UTC format). @@ -112,8 +112,8 @@ Regarding the specific data stored within the above table, if `attr_persistence` [Top](#top) -###Example -####`NGSIEvent` +### Example +#### `NGSIEvent` Assuming the following `NGSIEvent` is created from a notified NGSI context data (the code below is an object representation, not any real data format): ngsi-event={ @@ -147,7 +147,7 @@ Assuming the following `NGSIEvent` is created from a notified NGSI context data [Top](#top) -####Database and table names +#### Database and table names The MySQL database name will always be `vehicles`. The MySQL table names will be, depending on the configured data model, the following ones (old encoding): @@ -166,7 +166,7 @@ Using the new encoding: [Top](#top) -####Row-like storing +#### Row-like storing Assuming `attr_persistence=row` as configuration parameter, then `NGSIMySQLSink` will persist the data within the body as: mysql> select * from 4wheels_car1_car; @@ -180,7 +180,7 @@ Assuming `attr_persistence=row` as configuration parameter, then `NGSIMySQLSink` [Top](#top) -####Column-like storing +#### Column-like storing If `attr_persistence=colum` then `NGSIMySQLSink` will persist the data within the body as: mysql> select * from 4wheels_car1_car; @@ -193,8 +193,8 @@ If `attr_persistence=colum` then `NGSIMySQLSink` will persist the data within th [Top](#top) -##Administration guide -###Configuration +## Administration guide +### Configuration `NGSIMySQLSink` is configured through the following parameters: | Parameter | Mandatory | Default value | Comments | @@ -246,13 +246,13 @@ A configuration example could be: [Top](#top) -###Use cases +### Use cases Use `NGSIMySQLSink` if you are looking for a database storage not growing so much in the mid-long term. [Top](#top) -###Important notes -####About the table type and its relation with the grouping rules +### Important notes +#### About the table type and its relation with the grouping rules The table type configuration parameter, as seen, is a method for direct aggregation of data: by default destination (i.e. all the notifications about the same entity will be stored within the same MySQL table) or by default service-path (i.e. all the notifications about the same service-path will be stored within the same MySQL table). The [Grouping feature](./ngsi_grouping_interceptor.md) is another aggregation mechanism, but an inderect one. This means the grouping feature does not really aggregates the data into a single table, that's something the sink will done based on the configured table type (see above), but modifies the default destination or service-path, causing the data is finally aggregated (or not) depending on the table type. @@ -261,14 +261,14 @@ For instance, if the chosen table type is by destination and the grouping featur [Top](#top) -####About the persistence mode +#### About the persistence mode Please observe not always the same number of attributes is notified; this depends on the subscription made to the NGSI-like sender. This is not a problem for the `row` persistence mode, since fixed 8-fields data rows are inserted for each notified attribute. Nevertheless, the `column` mode may be affected by several data rows of different lengths (in term of fields). Thus, the `column` mode is only recommended if your subscription is designed for always sending the same attributes, event if they were not updated since the last notification. In addition, when running in `column` mode, due to the number of notified attributes (and therefore the number of fields to be written within the Datastore) is unknown by Cygnus, the table can not be automatically created, and must be provisioned previously to the Cygnus execution. That's not the case of the `row` mode since the number of fields to be written is always constant, independently of the number of notified attributes. [Top](#top) -####About batching +#### About batching As explained in the [programmers guide](#section3), `NGSIMySQLSink` extends `NGSISink`, which provides a built-in mechanism for collecting events from the internal Flume channel. This mechanism allows extending classes have only to deal with the persistence details of such a batch of events in the final backend. What is important regarding the batch mechanism is it largely increases the performance of the sink, because the number of writes is dramatically reduced. Let's see an example, let's assume a batch of 100 `NGSIEvent`s. In the best case, all these events regard to the same entity, which means all the data within them will be persisted in the same MySQL table. If processing the events one by one, we would need 100 inserts into MySQL; nevertheless, in this example only one insert is required. Obviously, not all the events will always regard to the same unique entity, and many entities may be involved within a batch. But that's not a problem, since several sub-batches of events are created within a batch, one sub-batch per final destination MySQL table. In the worst case, the whole 100 entities will be about 100 different entities (100 different MySQL tables), but that will not be the usual scenario. Thus, assuming a realistic number of 10-15 sub-batches per batch, we are replacing the 100 inserts of the event by event approach with only 10-15 inserts. @@ -281,12 +281,12 @@ By default, `NGSIMySQLSink` has a configured batch size and batch accumulation t [Top](#top) -####Time zone information +#### Time zone information Time zone information is not added in MySQL timestamps since MySQL stores that information as a environment variable. MySQL timestamps are stored in UTC time. [Top](#top) -####About the encoding +#### About the encoding Until version 1.2.0 (included), Cygnus applied a very simple encoding: * All non alphanumeric characters were replaced by underscore, `_`. @@ -308,7 +308,7 @@ Despite the old encoding will be deprecated in the future, it is possible to swi [Top](#top) -####About capping resources and expirating records +#### About capping resources and expirating records Capping and expiration are disabled by default. Nevertheless, if desired, this can be enabled: * Capping by the number of records. This allows the resource growing up until certain configured maximum number of records is reached (`persistence_policy.max_records`), and then maintains such a constant number of records. @@ -316,8 +316,8 @@ Capping and expiration are disabled by default. Nevertheless, if desired, this c [Top](#top) -##Programmers guide -###`NGSIMySQLSink` class +## Programmers guide +### `NGSIMySQLSink` class As any other NGSI-like sink, `NGSIMySQLSink` extends the base `NGSISink`. The methods that are extended are: void persistBatch(Batch batch) throws Exception; @@ -342,7 +342,7 @@ A complete configuration as the described above is read from the given `Context` [Top](#top) -###Authentication and authorization +### Authentication and authorization Current implementation of `NGSIMySQLSink` relies on the username and password credentials created at the MySQL endpoint. [Top](#top) diff --git a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_name_mappings_interceptor.md b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_name_mappings_interceptor.md index 56a8247a1..4009c37f1 100644 --- a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_name_mappings_interceptor.md +++ b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_name_mappings_interceptor.md @@ -1,4 +1,4 @@ -#NGSINameMappingsInterceptor +# NGSINameMappingsInterceptor Content: * [Functionality](#section1) @@ -9,7 +9,7 @@ Content: * [Configuration](#section2.1) * [Management Interface related operations](#section2.2) -##Functionality +## Functionality This is a custom Interceptor specifically designed for Cygnus. Its purpose is to alter an original `NGSIEvent` object (which comes from a NGSI notification handled by [`NGSIRestHandler`](./ngsi_rest_handler.md)) by replacing (one or more at the same time): * The FIWARE service Http header sent with the notification. @@ -27,7 +27,7 @@ As known, a `NGSIEvent` contains a set of headers and an already parsed version [Top](#top) -###Name mappings syntax +### Name mappings syntax There exists a name mappings file containing a Json following this format: ``` @@ -77,7 +77,7 @@ However, certain special behaviours must be noticed: [Top](#top) -###Headers before and after intercepting +### Headers before and after intercepting Before interception, these are the headers added by the [`NGSIRestHandler`](./ngsi_rest_handler.md) to all the internal Flume events of type `Event`: * `fiware-service`. FIWARE service which the entity related to the notified data belongs to. @@ -94,7 +94,7 @@ Other interceptors may add further headers, such as the `timestamp` header added [Top](#top) -###Example +### Example Let's assume these name mappings: ``` @@ -264,8 +264,8 @@ intercepted-ngsi-event-2={ [Top](#top) -##Administration guide -###Configuration +## Administration guide +### Configuration `NGSINameMappingsInterceptor` is configured through the following parameters: | Parameter | Mandatory | Default value | Comments | @@ -286,7 +286,7 @@ Please check the specific sink documentation in the [Flume extensions catalogue] [Top](#top) -###Management Interface related operations +### Management Interface related operations Coming soon. [Top](#top) diff --git a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_postgresql_sink.md b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_postgresql_sink.md index bb7191ae5..0942fb702 100644 --- a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_postgresql_sink.md +++ b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_postgresql_sink.md @@ -1,4 +1,4 @@ -#NGSIPostgreSQLSink +# NGSIPostgreSQLSink Content: * [Functionality](#section1) @@ -27,7 +27,7 @@ Content: * [`NGSIPostgreSQLSink` class](#section3.1) * [Authentication and authorization](#section3.2) -##Functionality +## Functionality `com.iot.telefonica.cygnus.sinks.NGSIPostgreSQLSink`, or simply `NGSIPostgreSQLSink` is a sink designed to persist NGSI-like context data events within a [PostgreSQL server](https://www.postgresql.org/). Usually, such a context data is notified by a [Orion Context Broker](https://github.com/telefonicaid/fiware-orion) instance, but could be any other system speaking the NGSI language. Independently of the data generator, NGSI context data is always transformed into internal `NGSIEvent` objects at Cygnus sources. In the end, the information within these events must be mapped into specific PostgreSQL data structures. @@ -36,19 +36,19 @@ Next sections will explain this in detail. [Top](#top) -###Mapping NGSI events to `NGSIEvent` objects +### Mapping NGSI events to `NGSIEvent` objects Notified NGSI events (containing context data) are transformed into `NGSIEvent` objects (for each context element a `NGSIEvent` is created; such an event is a mix of certain headers and a `ContextElement` object), independently of the NGSI data generator or the final backend where it is persisted. This is done at the cygnus-ngsi Http listeners (in Flume jergon, sources) thanks to [`NGSIRestHandler`](/ngsi_rest_handler.md). Once translated, the data (now, as `NGSIEvent` objects) is put into the internal channels for future consumption (see next section). [Top](#top) -###Mapping `NGSIEvent`s to PostgreSQL data structures +### Mapping `NGSIEvent`s to PostgreSQL data structures PostgreSQL organizes the data in schemas inside a database that contain tables of data rows. Such organization is exploited by `NGSIPostgreSQLSink` each time a `NGSIEvent` is going to be persisted. [Top](#top) -####PostgreSQL databases naming conventions +#### PostgreSQL databases naming conventions Previous to any operation with PostgreSQL you need to create the database to be used. It must be said [PostgreSQL only accepts](https://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS) alphanumeric characters and the underscore (`_`). This leads to certain [encoding](#section2.3.4) is applied depending on the `enable_encoding` configuration parameter. @@ -57,7 +57,7 @@ PostgreSQL [databases name length](http://www.postgresql.org/docs/current/static [Top](#top) -####PostgreSQL schemas naming conventions +#### PostgreSQL schemas naming conventions A schema named as the notified `fiware-service` header value (or, in absence of such a header, the defaulted value for the FIWARE service) is created (if not existing yet). It must be said [PostgreSQL only accepts](https://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS) alphanumeric characters and the underscore (`_`). This leads to certain [encoding](#section2.3.4) is applied depending on the `enable_encoding` configuration parameter. @@ -66,7 +66,7 @@ PostgreSQL [schemas name length](http://www.postgresql.org/docs/current/static/s [Top](#top) -####PostgreSQL tables naming conventions +#### PostgreSQL tables naming conventions The name of these tables depends on the configured data model (see the [Configuration](#section2.1) section for more details): * Data model by service path (`data_model=dm-by-service-path`). As the data model name denotes, the notified FIWARE service path (or the configured one as default in [`NGSIRestHandler`](./ngsi_rest_handler.md)) is used as the name of the table. This allows the data about all the NGSI entities belonging to the same service path is stored in this unique table. The only constraint regarding this data model is the FIWARE service path cannot be the root one (`/`). @@ -94,7 +94,7 @@ Please observe the concatenation of entity ID and type is already given in the ` [Top](#top) -####Row-like storing +#### Row-like storing Regarding the specific data stored within the above table, if `attr_persistence` parameter is set to `row` (default storing mode) then the notified data is stored attribute by attribute, composing an insert for each one of them. Each insert contains the following fields: * `recvTimeTs`: UTC timestamp expressed in miliseconds. @@ -109,7 +109,7 @@ Regarding the specific data stored within the above table, if `attr_persistence` [Top](#top) -####Column-like storing +#### Column-like storing Regarding the specific data stored within the above table, if `attr_persistence` parameter is set to `column` then a single line is composed for the whole notified entity, containing the following fields: * `recvTime`: UTC timestamp in human-redable format ([ISO 8601](http://en.wikipedia.org/wiki/ISO_8601)). @@ -121,8 +121,8 @@ Regarding the specific data stored within the above table, if `attr_persistence` [Top](#top) -###Example -####`NGSIEvent` +### Example +#### `NGSIEvent` Assuming the following `NGSIEvent` is created from a notified NGSI context data (the code below is an object representation, not any real data format): ngsi-event={ @@ -157,7 +157,7 @@ Assuming the following `NGSIEvent` is created from a notified NGSI context data [Top](#top) -####Database, schema and table names +#### Database, schema and table names The PostgreSQL database name will be of the user's choice. The PostgreSQL schema will always be `vehicles`. @@ -178,7 +178,7 @@ Using the new encoding: [Top](#top) -####Row-like storing +#### Row-like storing Assuming `attr_persistence=row` as configuration parameters, then `NGSIPostgreSQLSink` will persist the data within the body as: $ psql -U myuser @@ -212,13 +212,13 @@ Assuming `attr_persistence=row` as configuration parameters, then `NGSIPostgreSQ [Top](#top) -####Column-like storing +#### Column-like storing Coming soon. [Top](#top) -##Administration guide -###Configuration +## Administration guide +### Configuration `NGSIPostgreSQLSink` is configured through the following parameters: | Parameter | Mandatory | Default value | Comments | @@ -268,13 +268,13 @@ A configuration example could be: [Top](#top) -###Use cases +### Use cases Use `NGSIPostgreSQLSink` if you are looking for a big database with several tenants. PostgreSQL is bad at having several databases, but very good at having different schemas. [Top](#top) -###Important notes -####About the table type and its relation with the grouping rules +### Important notes +#### About the table type and its relation with the grouping rules The table type configuration parameter, as seen, is a method for direct aggregation of data: by default destination (i.e. all the notifications about the same entity will be stored within the same PostgreSQL table) or by default service-path (i.e. all the notifications about the same service-path will be stored within the same PostgreSQL table). The [Grouping feature](./ngsi_grouping_interceptor.md) is another aggregation mechanism, but an inderect one. This means the grouping feature does not really aggregates the data into a single table, that's something the sink will done based on the configured table type (see above), but modifies the default destination or service-path, causing the data is finally aggregated (or not) depending on the table type. @@ -283,14 +283,14 @@ For instance, if the chosen table type is by destination and the grouping featur [Top](#top) -####About the persistence mode +#### About the persistence mode Please observe not always the same number of attributes is notified; this depends on the subscription made to the NGSI-like sender. This is not a problem for the `row` persistence mode, since fixed 8-fields data rows are inserted for each notified attribute. Nevertheless, the `column` mode may be affected by several data rows of different lengths (in term of fields). Thus, the `column` mode is only recommended if your subscription is designed for always sending the same attributes, event if they were not updated since the last notification. In addition, when running in `column` mode, due to the number of notified attributes (and therefore the number of fields to be written within the Datastore) is unknown by Cygnus, the table can not be automatically created, and must be provisioned previously to the Cygnus execution. That's not the case of the `row` mode since the number of fields to be written is always constant, independently of the number of notified attributes. [Top](#top) -####About batching +#### About batching As explained in the [programmers guide](#section3), `NGSIPostgreSQLSink` extends `NGSISink`, which provides a built-in mechanism for collecting events from the internal Flume channel. This mechanism allows extending classes have only to deal with the persistence details of such a batch of events in the final backend. What is important regarding the batch mechanism is it largely increases the performance of the sink, because the number of writes is dramatically reduced. Let's see an example, let's assume a batch of 100 `NGSIEvent`s. In the best case, all these events regard to the same entity, which means all the data within them will be persisted in the same PostgreSQL table. If processing the events one by one, we would need 100 inserts into PostgreSQL; nevertheless, in this example only one insert is required. Obviously, not all the events will always regard to the same unique entity, and many entities may be involved within a batch. But that's not a problem, since several sub-batches of events are created within a batch, one sub-batch per final destination PostgreSQL table. In the worst case, the whole 100 entities will be about 100 different entities (100 different PostgreSQL tables), but that will not be the usual scenario. Thus, assuming a realistic number of 10-15 sub-batches per batch, we are replacing the 100 inserts of the event by event approach with only 10-15 inserts. @@ -303,12 +303,12 @@ By default, `NGSIPostgreSQLSink` has a configured batch size and batch accumulat [Top](#top) -####Time zone information +#### Time zone information Time zone information is not added in PostgreSQL timestamps since PostgreSQL stores that information as a environment variable. PostgreSQL timestamps are stored in UTC time. [Top](#top) -####About the encoding +#### About the encoding Until version 1.2.0 (included), Cygnus applied a very simple encoding: * All non alphanumeric characters were replaced by underscore, `_`. @@ -330,8 +330,8 @@ Despite the old encoding will be deprecated in the future, it is possible to swi [Top](#top) -##Programmers guide -###`NGSIPostgreSQLSink` class +## Programmers guide +### `NGSIPostgreSQLSink` class As any other NGSI-like sink, `NGSIPostgreSQLSink` extends the base `NGSISink`. The methods that are extended are: void persistBatch(Batch batch) throws Exception; @@ -348,7 +348,7 @@ A complete configuration as the described above is read from the given `Context` [Top](#top) -###Authentication and authorization +### Authentication and authorization Current implementation of `NGSIPostgreSQLSink` relies on the database, username and password credentials created at the PostgreSQL endpoint. [Top](#top) diff --git a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_rest_handler.md b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_rest_handler.md index d03bf32ef..16b8d7d18 100644 --- a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_rest_handler.md +++ b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_rest_handler.md @@ -1,4 +1,4 @@ -#NGSIRestHandler +# NGSIRestHandler Content: * [Functionality](#section1) @@ -10,8 +10,8 @@ Content: * [Programmers guide](#section3) * [`NGSIRestHandler` class](#section3.1) -##Functionality -###Mapping NGSI events to `NGSIEvent` objects +## Functionality +### Mapping NGSI events to `NGSIEvent` objects This section explains how a notified NGSI event (a http message containing headers and payload) is used to create a `NGSIEvent` object, suitable for being consumed by any of the Cygnus sinks, thanks to `NGSIRestHandler`. It is necessary to remark again this handler is designed for being used by `HttpSource`, the native component of Apache Flume. An http message containing a NGSI-like notification will be received by `HttpSource` and passed to `NGSIRestHandler` in order to create one or more `NGSIEvent` objects (one per notified context element) to be put in a sink's channel (mainly, these channels are objects in memory, but could be files). @@ -37,7 +37,7 @@ Finally, it must be said the `NGSIEVent` contains another field, of type `Contex [Top](#top) -###Example +### Example Let's assume the following not-intercepted event regarding a received notification (the code below is an object representation, not any real data format): ``` @@ -111,8 +111,8 @@ ngsi-event-2={ [Top](#top) -##Administration guide -###Configuration +## Administration guide +### Configuration `NGSIRestHandler` is configured through the following parameters: | Parameter | Mandatory | Default value | Comments | @@ -132,15 +132,15 @@ A configuration example could be: [Top](#top) -###Accepted character set +### Accepted character set This handler for NGSI only works with UTF-8 encoding. Thus, notifications must send a `Content-Type` header with `application/json; charset=utf-8` as value. Any other content type wont be considered and the notification will be discarded. It is expected UTF-8 character set is maintained by all the Flume elements in the configuration, in order the final sinks (or their backend abstractions, if they exist) compose their writes/inserts/upserts by properly specifying this kind of encoding. [Top](#top) -##Programmers guide -###`NGSIRestHandler` class +## Programmers guide +### `NGSIRestHandler` class TBD -[Top](#top) \ No newline at end of file +[Top](#top) diff --git a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_sth_sink.md b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_sth_sink.md index edfe8d432..71718cce7 100644 --- a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_sth_sink.md +++ b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_sth_sink.md @@ -1,4 +1,4 @@ -#NGSISTHSink +# NGSISTHSink Content: * [Functionality](#section1) @@ -24,7 +24,7 @@ Content: * [`MongoBackend` class](#section3.2) * [Authentication and authorization](#section3.3) -##Functionality +## Functionality `com.iot.telefonica.cygnus.sinks.NGSISTHSink`, or simply `NGSISTHSink` is a sink designed to persist NGSI-like context data events within a MongoDB server in an aggregated way, specifically these measures are computed: * For numeric attribute values: @@ -44,19 +44,19 @@ Next sections will explain this in detail. [Top](#top) -###Mapping NGSI events to `NGSIEvent` objects +### Mapping NGSI events to `NGSIEvent` objects Notified NGSI events (containing context data) are transformed into `NGSIEvent` objects (for each context element a `NGSIEvent` is created; such an event is a mix of certain headers and a `ContextElement` object), independently of the NGSI data generator or the final backend where it is persisted. This is done at the cygnus-ngsi Http listeners (in Flume jergon, sources) thanks to [`NGSIRestHandler`](/ngsi_rest_handler.md). Once translated, the data (now, as `NGSIEvent` objects) is put into the internal channels for future consumption (see next section). [Top](#top) -###Mapping `NGSIEvent`s to MongoDB data structures +### Mapping `NGSIEvent`s to MongoDB data structures MongoDB organizes the data in databases that contain collections of Json documents. Such organization is exploited by `NGSISTHSink` each time a `NGSIEvent` is going to be persisted. [Top](#top) -####MongoDB databases and collections naming conventions +#### MongoDB databases and collections naming conventions A database called as the `fiware-service` header value within the event is created (if not existing yet). A configured prefix is added (by default, `sth_`). It must be said [MongoDB does not accept](https://docs.mongodb.com/manual/reference/limits/#naming-restrictions) `/`, `\`, `.`, `"` and `$` in the database names. This leads to certain [encoding](#section2.3.4) is applied depending on the `enable_encoding` configuration parameter. @@ -65,7 +65,7 @@ MongoDB [namespaces (database + collection) name length](https://docs.mongodb.co [Top](#top) -####MongoDB collections naming conventions +#### MongoDB collections naming conventions The name of these collections depends on the configured data model and analysis mode (see the [Configuration](#section2.1) section for more details): * Data model by service path (`data_model=dm-by-service-path`). As the data model name denotes, the notified FIWARE service path (or the configured one as default in [`NGSIRestHandler`](./ngsi_rest_handler.md)) is used as the name of the collection. This allows the data about all the NGSI entities belonging to the same service path is stored in this unique table. The configured prefix is prepended to the collection name, while `.aggr` sufix is appended to it. @@ -94,7 +94,7 @@ Please observe the concatenation of entity ID and type is already given in the ` [Top](#top) -####Storing +#### Storing As said, `NGSISTHSink` has been designed for pre-aggregating certain statistics about entities and their attributes: * For numeric attribute values: @@ -114,8 +114,8 @@ Finally, each document will save the number of samples that were used for [Top](#top) -###Example -####`NGSIEvent` +### Example +#### `NGSIEvent` Assuming the following `NGSIEvent` is created from a notified NGSI context data (the code below is an object representation, not any real data format): ngsi-event={ @@ -150,7 +150,7 @@ Assuming the following `NGSIEvent` is created from a notified NGSI context data [Top](#top) -####Database and collection names +#### Database and collection names A MongoDB database named as the concatenation of the prefix and the notified FIWARE service path, i.e. `sth_vehicles`, will be created. Regarding the collection names, the MongoDB collection names will be, depending on the configured data model, the following ones (old encoding): @@ -169,7 +169,7 @@ Using the new encoding: [Top](#top) -####Storing +#### Storing Assuming `data_model=dm-by-entity` and all the possible resolutions as configuration parameters (see section [Configuration](#section2.1) for more details), then `NGSISTHSink` will persist the data within the body as: $ mongo -u myuser -p @@ -289,8 +289,8 @@ Assuming `data_model=dm-by-entity` and all the possible resolutions as configura [Top](#top) -##Administration guide -###Configuration +## Administration guide +### Configuration `NGSISTHSink` is configured through the following parameters: | Parameter | Mandatory | Default value | Comments | @@ -342,13 +342,13 @@ A configuration example could be: [Top](#top) -###Use cases +### Use cases Use `NGSISTHSink` if you are looking for a Json-based document storage about aggregated data not growing so much in the mid-long term. [Top](#top) -###Important notes -####About batching +### Important notes +#### About batching Despite `NGSISTHSink` allows for batching configuration, it is not true it works with real batches as the rest of sinks. The batching mechanism was designed to accumulate NGSI-like notified data following the configured data model (i.e. by service, service path, entity or attribute) and then perform a single bulk-like insert operation comprising all the accumulated data. Nevertheless, STH Comet storage aggregates data through updates, i.e. there are no inserts but updates of certain pre-populated collections. Then, these updates implement at MongoDB level the expected aggregations of STH Comet (sum, sum2, max and min). @@ -359,12 +359,12 @@ Thus, `NGSISTHSink` does not implement a real batching mechanism as usual. Pleas [Top](#top) -####About `recvTime` and `TimeInstant` metadata +#### About `recvTime` and `TimeInstant` metadata By default, `NGSISTHSink` stores the notification reception timestamp. Nevertheless, if a metadata named `TimeInstant` is notified, then such metadata value is used instead of the reception timestamp. This is useful when wanting to persist a measure generation time (which is thus notified as a `TimeInstant` metadata) instead of the reception time. [Top](#top) -####About the encoding +#### About the encoding `NGSIMongoSink` follows the [MongoDB naming restrictions](https://docs.mongodb.org/manual/reference/limits/#naming-restrictions). In a nutshell: Until version 1.2.0 (included), Cygnus applied a very simple encoding: @@ -383,7 +383,7 @@ Despite the old encoding will be deprecated in the future, it is possible to swi [Top](#top) -####About supported versions of MongoDB +#### About supported versions of MongoDB This sink has been tested with the following versions of Mongo: * 3.2.6 @@ -391,8 +391,8 @@ This sink has been tested with the following versions of Mongo: [Top](#top) -##Programmers guide -###`NGSISTHSink` class +## Programmers guide +### `NGSISTHSink` class `NGSISTHSink` extends `NGSIMongoBaseSink`, which as any other NGSI-like sink, extends the base `NGSISink`. The methods that are extended are: void persistBatch(Batch batch) throws Exception; @@ -409,7 +409,7 @@ A complete configuration as the described above is read from the given `Context` [Top](#top) -###`MongoBackend` class +### `MongoBackend` class This is a convenience backend class for MongoDB that provides methods to persist the context data both in raw of aggregated format. Relevant methods regarding raw format are: public void createDatabase(String dbName) throws Exception; @@ -428,7 +428,7 @@ Nothing special is done with regards to the encoding. Since Cygnus generally wor [Top](#top) -###Authentication and authorization +### Authentication and authorization Current implementation of `NGSIMongoSink` relies on the username and password credentials created at the MongoDB endpoint. [Top](#top) diff --git a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_test_sink.md b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_test_sink.md index 8d807e12a..fd86dadbd 100644 --- a/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_test_sink.md +++ b/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_test_sink.md @@ -1,4 +1,4 @@ -#NGSITestSink +# NGSITestSink Content: * [Functionality](#section1) @@ -11,7 +11,7 @@ Content: * [Important notes](#section2.3) * [About batching](#section2.3.1) -##Functionality +## Functionality `com.iot.telefonica.cygnus.sinks.NGSITestSink`, or simply `NGSITestSink` is a sink designed to test Cygnus when receiving NGSI-like context data events. Usually, such a context data is notified by a [Orion Context Broker](https://github.com/telefonicaid/fiware-orion) instance, but could be any other system speaking the NGSI language. Independently of the data generator, NGSI context data is always transformed into internal `NGSIEvent` Object at Cygnus sources. In the end, the information within these events is not meant to be persisted at any real storage, but simply logged (depending on your `log4j` configuration, the logs will be printed in console, a file...). @@ -20,20 +20,20 @@ Next sections will explain this in detail. [Top](#top) -###Mapping NGSI events to `NGSIEvent` objects +### Mapping NGSI events to `NGSIEvent` objects Notified NGSI events (containing context data) are transformed into `NGSIEvent` objects (for each context element a `NGSIEvent` is created; such an event is a mix of certain headers and a `ContextElement` object), independently of the NGSI data generator or the final backend where it is persisted. This is done at the cygnus-ngsi Http listeners (in Flume jergon, sources) thanks to [`NGSIRestHandler`](/ngsi_rest_handler.md). Once translated, the data (now, as `NGSIEvent` objects) is put into the internal channels for future consumption (see next section). [Top](#top) -###Mapping `NGSIEvent`s lo logs +### Mapping `NGSIEvent`s lo logs The mapping is direct, converting the context data into strings to be written in console, or file... [Top](#top) -###Example -####`NGSIEvent` +### Example +#### `NGSIEvent` Assuming the following `NGSIEvent` is created from a notified NGSI context data (the code below is an object representation, not any real data format): ngsi-event={ @@ -77,8 +77,8 @@ time=2015-12-10T14:31:49.486CET | lvl=INFO | trans=1429535775-308-0000000000 | s [Top](#top) -##Adinistration guide -###Configuration +## Adinistration guide +### Configuration `NGSITestSink` is configured through the following parameters: | Parameter | Mandatory | Default value | Comments | @@ -108,13 +108,13 @@ A configuration example could be: [Top](#top) -###Use cases +### Use cases Use this sink in order to test if a Cygnus deployment is properly receiving notifications from an Orion Context Broker premise. [Top](#top) -###Important notes -####About batching +### Important notes +#### About batching `NGSITestSink` extends `NGSISink`, which provides a batch-based built-in mechanism for collecting events from the internal Flume channel. This mechanism allows extending classes have only to deal with the persistence details of such a batch of events in the final backend. What is important regarding the batch mechanism is it largely increases the performance of the sink, because the number of writes is dramatically reduced. Particularly, this is not important for this test sink, but the other sinks will largely benefit from this feature. Please, check the specific sink documentation for more details. diff --git a/doc/cygnus-ngsi/flume_extensions_catalogue/round_robin_channel_selector.md b/doc/cygnus-ngsi/flume_extensions_catalogue/round_robin_channel_selector.md index 974310b3e..fa78e1600 100644 --- a/doc/cygnus-ngsi/flume_extensions_catalogue/round_robin_channel_selector.md +++ b/doc/cygnus-ngsi/flume_extensions_catalogue/round_robin_channel_selector.md @@ -1,2 +1,2 @@ -#RoundRobinChannelSelector +# RoundRobinChannelSelector Coming soon. diff --git a/doc/cygnus-ngsi/installation_and_administration_guide/README.md b/doc/cygnus-ngsi/installation_and_administration_guide/README.md index 9b39e2d2f..610369b80 100644 --- a/doc/cygnus-ngsi/installation_and_administration_guide/README.md +++ b/doc/cygnus-ngsi/installation_and_administration_guide/README.md @@ -1,4 +1,4 @@ -#Installation and Administration Guide +# Installation and Administration Guide * [Introduction](./introduction.md) * Installation: diff --git a/doc/cygnus-ngsi/installation_and_administration_guide/backends_as_sth.md b/doc/cygnus-ngsi/installation_and_administration_guide/backends_as_sth.md index 9e6fda8fc..0716be7e0 100644 --- a/doc/cygnus-ngsi/installation_and_administration_guide/backends_as_sth.md +++ b/doc/cygnus-ngsi/installation_and_administration_guide/backends_as_sth.md @@ -1,11 +1,11 @@ -#Backends as short-term historics +# Backends as short-term historics Backends are used by Cygnus NGSI as "infinite" historical context data repositories. More and more data is appended to files, tables and collections as data flow from a NGSI source. Such a data flow may never end, thus, insertions may never end too, exhausting the available storing resources. Therefore, it is important to provide mechanisms in charge of controlling how much data is stored in the persistence backends, removing old data in favour of new one, resulting in some kind of short-term historic implementation. From version 1.7.0 this is something that can be done by means of the **capping** and/or **expirating** features. -##How it works +## How it works There are two approaches when deciding which data must be removed from existent historics: * By **capping** data "records"(*) once certain size limit has been reached. In other words, to ensure that only the last N records are stored, honouring the capping limit in place. @@ -30,7 +30,7 @@ Which sinks provide this kind of functionality? For the time being: [Top](#top) -##The special case of `NGSIMongoSink` and `NGSISTHSink` +## The special case of `NGSIMongoSink` and `NGSISTHSink` `NGSIMongoSink` and `NGSISTHSink` implement this kind of functionality from version 0.13.0, since the data stored in MongoDB and STH Comet was wanted to be a short-term historic from the very begining. Nevertheless, the parameters controlling the functionality are very different from the above ones: | Parameter | Mandatory | Default value | Comments | @@ -43,7 +43,7 @@ There are also differences in the implementations: while MongoDB natively provid [Top](#top) -##Future work +## Future work Most probably in the future all the sinks sharing this feature will see their parameters homogenized, since conceptually the capping/expirating feature implmented by CKAN and MySQL sinks is the same than the time and size-based data management policies in MongoDB and STH sinks. [Top](#top) diff --git a/doc/cygnus-ngsi/installation_and_administration_guide/deprecated_and_removed.md b/doc/cygnus-ngsi/installation_and_administration_guide/deprecated_and_removed.md index b09d76505..393e068ff 100644 --- a/doc/cygnus-ngsi/installation_and_administration_guide/deprecated_and_removed.md +++ b/doc/cygnus-ngsi/installation_and_administration_guide/deprecated_and_removed.md @@ -1,4 +1,4 @@ -#Deprecated and removed functionality +# Deprecated and removed functionality Content: * [Functionality deprecation and remove policy](#section1) @@ -13,7 +13,7 @@ Content: * [`matching_table` parameter](#section3.5) * [Hash-based collection names in MongoDB/STH](#section3.6) -##Functionality deprecation and remove policy +## Functionality deprecation and remove policy At Cygnus NGSI agent (cygnus-ngsi), functionality lifecycle is: 1. New feature is designed. @@ -30,23 +30,23 @@ Deprecated features are removed not before 3 developement sprints (usually, a de [Top](#top) -##Deprecated functionalities -###Grouping Rules +## Deprecated functionalities +### Grouping Rules Added at version [0.5](https://github.com/telefonicaid/fiware-cygnus/releases/tag/release-0.5) (issue [107](https://github.com/telefonicaid/fiware-cygnus/issues/107)). Deprecated after releasing version [1.6.0](https://github.com/telefonicaid/fiware-cygnus/releases/tag/1.6.0) (issue [1182](https://github.com/telefonicaid/fiware-cygnus/issues/1182)). [Top](#top) -###`flip_coordinates` parameter +### `flip_coordinates` parameter Added at version [1.0.0](https://github.com/telefonicaid/fiware-cygnus/releases/tag/1.0.0) (issue [927](https://github.com/telefonicaid/fiware-cygnus/issues/927)). Deprecated after releasing version [1.6.0](https://github.com/telefonicaid/fiware-cygnus/releases/tag/1.6.0) (issue [1313](https://github.com/telefonicaid/fiware-cygnus/issues/1313)). [Top](#top) -##Removed functionalities -###`events_ttl` parameter +## Removed functionalities +### `events_ttl` parameter Added at version [0.1](https://github.com/telefonicaid/fiware-cygnus/releases/tag/release-0.1). Never deprecated. @@ -55,7 +55,7 @@ Removed in favour of `batch_ttl` parameter after releasing version [0.13.0](http [Top](#top) -###XML notifications support +### XML notifications support Added at version [0.1](https://github.com/telefonicaid/fiware-cygnus/releases/tag/release-0.1). Deprecated in favour of Json notifications from the very begining of the development. @@ -64,7 +64,7 @@ Removed after releasing version [0.13.0](https://github.com/telefonicaid/fiware- [Top](#top) -###`cosmos_`-like HDFS parameters +### `cosmos_`-like HDFS parameters Added at version [0.1](https://github.com/telefonicaid/fiware-cygnus/releases/tag/release-0.1). Deprecated in favour of `hdfs_`-like parameters after releasing version [0.8.1](https://github.com/telefonicaid/fiware-cygnus/releases/tag/release-0.8.1) (issue [374](https://github.com/telefonicaid/fiware-cygnus/issues/374)). @@ -73,7 +73,7 @@ Removed after releasing version [1.0.0](https://github.com/telefonicaid/fiware-c [Top](#top) -###Data model by attribute in `NGSICartoDBSink` +### Data model by attribute in `NGSICartoDBSink` Added at version [1.0.0](https://github.com/telefonicaid/fiware-cygnus/releases/tag/1.0.0) (issue [927](https://github.com/telefonicaid/fiware-cygnus/issues/927)). Never deprecated. @@ -82,7 +82,7 @@ Removed after releasing version [1.1.0](https://github.com/telefonicaid/fiware-c [Top](#top) -###`matching_table` parameter +### `matching_table` parameter Added at version [0.5](https://github.com/telefonicaid/fiware-cygnus/releases/tag/release-0.5) (issue [https://github.com/telefonicaid/fiware-cygnus/issues/107](107)). Deprecated in favour of `grouping_rules_conf_file` after releasing version [0.8.1](https://github.com/telefonicaid/fiware-cygnus/releases/tag/release-0.8.1) (issue [https://github.com/telefonicaid/fiware-cygnus/issues/387](387)). @@ -91,7 +91,7 @@ Removed after releaseing version [1.1.0](https://github.com/telefonicaid/fiware- [Top](#top) -###Hash-based collection names for MongoDB/STH +### Hash-based collection names for MongoDB/STH Added at version [0.8.1](https://github.com/telefonicaid/fiware-cygnus/releases/tag/0.8.1) (issue [420](https://github.com/telefonicaid/fiware-cygnus/issues/420)). Never deprecated. diff --git a/doc/cygnus-ngsi/installation_and_administration_guide/diagnosis_procedures.md b/doc/cygnus-ngsi/installation_and_administration_guide/diagnosis_procedures.md index a5141ca4e..bc24f7202 100644 --- a/doc/cygnus-ngsi/installation_and_administration_guide/diagnosis_procedures.md +++ b/doc/cygnus-ngsi/installation_and_administration_guide/diagnosis_procedures.md @@ -1,3 +1,3 @@ -#Diagnosis procedures +# Diagnosis procedures Please, check the [cygnus-common](../../cygnus-common/installation_and_administration_guide/diagnosis_procedures.md) section. diff --git a/doc/cygnus-ngsi/installation_and_administration_guide/grouping_rules.md b/doc/cygnus-ngsi/installation_and_administration_guide/grouping_rules.md index 98fb6af3a..6d655f4ba 100644 --- a/doc/cygnus-ngsi/installation_and_administration_guide/grouping_rules.md +++ b/doc/cygnus-ngsi/installation_and_administration_guide/grouping_rules.md @@ -1,4 +1,4 @@ -#Grouping Rules +# Grouping Rules **IMPORTANT NOTE: from release 1.6.0, this feature is deprecated in favour of Name Mappings. More details can be found [here](./deprecated_and_removed.md#section2.1).** Grouping rules are an advanced global feature of Cygnus. It is global because it is available for all NGSI sinks. They allow changing the notified FIWARE service path and the concatenation of entity ID and entity type (this is called the destination). @@ -32,10 +32,10 @@ Explained fields of a Grouping Rule: * destination: Name of the HDFS file or CKAN resource where the data will be effectively persisted. In the case of MySQL, Mongo and STH Comet this sufixes the table/collection name. * fiware\_service\_path: New `fiware-servicePath` replacing the notified one. The sinks will translate this into the name of the HDFS folder or CKAN package where the above destination entity will be placed. In the case of MySQL, Mongo and STH Comet this prefixes the table/collection name. It must start with `/` or the whole rule will be discarded. -##Important notes +## Important notes Please observe the grouping rules definition is global to all the sinks, at NGSIRestHandler, but then the application is local to the sink, depending on the `enable_grouping` parameter. Thus, if any of your sinks is going to take advantage of the grouping rules, simply leave blank the grouping rules configuration file in the REST handler. That will avoid unnecessary interception with grouping actions. -##Grouping rules vs Name mappings +## Grouping rules vs Name mappings As seen, the name mappings feature is quite similar to the already existent grouping rules. Both of them are Flume interceptors and both of them allow changing certain notified name elements. Thus, which are the differences? Mainly: | Grouping rules | Name mappings | @@ -46,7 +46,7 @@ As seen, the name mappings feature is quite similar to the already existent grou [Top](#top) -##Further reading +## Further reading Please, check the [specific documentation](../flume_extensions_catalogue/ngsi_grouping_interceptor.md) for this custom interceptor in the Flume extensions catalogue for cygnus-ngsi agent. [Top](#top) diff --git a/doc/cygnus-ngsi/installation_and_administration_guide/install_from_sources.md b/doc/cygnus-ngsi/installation_and_administration_guide/install_from_sources.md index 3c45355cf..f68e5997c 100644 --- a/doc/cygnus-ngsi/installation_and_administration_guide/install_from_sources.md +++ b/doc/cygnus-ngsi/installation_and_administration_guide/install_from_sources.md @@ -1,4 +1,4 @@ -#Installing cygnus-ngsi from sources +# Installing cygnus-ngsi from sources Content: * [Prerequisites](#section1) @@ -8,13 +8,13 @@ Content: * [Known issues](#section2.3) * [Installing dependencies](#section3) -##Prerequisites +## Prerequisites [`cygnus-common`](../../cygnus-common/installation_and_administration_guide/install_from_sources.md) must be installed. This includes Maven, `cygnus` user creation, `log4j` path creation, Apache Flume and `cygnus-flume-ng` script installation. [Top](#top) -##Installing Cygnus -###Cloning `fiware-cygnus` +## Installing Cygnus +### Cloning `fiware-cygnus` Start by cloning the Github repository: $ git clone https://github.com/telefonicaid/fiware-cygnus.git @@ -25,7 +25,7 @@ Start by cloning the Github repository: [Top](#top) -###Installing `cygnus-ngsi` +### Installing `cygnus-ngsi` `cygnus-ngsi` can be built as a fat Java jar file containing all third-party dependencies (**recommended**): $ cd cygnus-ngsi @@ -42,14 +42,14 @@ Finally, please find a `compile.sh` script containing all the commands shown in [Top](#top) -###Known issues +### Known issues It may happen while compiling `cygnus-ngsi` the Maven JVM has not enough memory. This can be changed as detailed at the [Maven official documentation](https://cwiki.apache.org/confluence/display/MAVEN/OutOfMemoryError): $ export MAVEN_OPTS="-Xmx512m -XX:MaxPermSize=128m" [Top](#top) -##Installing dependencies +## Installing dependencies These are the packages you will need to install under `APACHE_FLUME_HOME/plugins.d/cygnus/libext/` **if you did not included them in the cygnus-common jar**: | Cygnus dependencies | Version | Required by / comments | diff --git a/doc/cygnus-ngsi/installation_and_administration_guide/install_with_docker.md b/doc/cygnus-ngsi/installation_and_administration_guide/install_with_docker.md index ee68c6ece..42a3e4650 100644 --- a/doc/cygnus-ngsi/installation_and_administration_guide/install_with_docker.md +++ b/doc/cygnus-ngsi/installation_and_administration_guide/install_with_docker.md @@ -1,4 +1,4 @@ -#cygnus-ngsi docker +# cygnus-ngsi docker Content: * [Before starting](#section1) @@ -12,13 +12,13 @@ Content: * [Environment variables](#section3.2.2) * [Using volumes](#section3.2.3) -##Before starting +## Before starting Obviously, you will need docker installed and running in you machine. Please, check [this](https://docs.docker.com/linux/started/) official start guide. [Top](#top) -##Getting an image -###Building from sources +## Getting an image +### Building from sources Start by cloning the `fiware-cygnus` repository: $ git clone https://github.com/telefonicaid/fiware-cygnus.git @@ -42,7 +42,7 @@ centos 6 273a1eca2d3a 2 weeks ago [Top](#top) -###Using docker hub image +### Using docker hub image Instead of building an image from the scratch, you may download it from [hub.docker.com](https://hub.docker.com/r/fiware/cygnus-ngsi/): $ docker pull fiware/cygnus-ngsi @@ -58,8 +58,8 @@ centos 6 273a1eca2d3a 2 weeks ago [Top](#top) -##Using the image -###As it is +## Using the image +### As it is The cygnus-ngsi image (either built from the scratch, either downloaded from hub.docker.com) allows running a Cygnus agent in charge of receiving NGSI-like notifications and persisting them into wide variety of storages: MySQL (Running in a `iot-mysql` host), MongoDB and STH (running in a `iot-mongo` host), CKAN (running in `iot-ckan` host), HDFS (running in `iot-hdfs` host) and Carto (a cloud service at `https://.cartodb.com`). Start a container for this image by typing in a terminal: @@ -152,7 +152,7 @@ CONTAINER ID IMAGE COMMAND CREATED [Top](#top) -###Using a specific configuration +### Using a specific configuration As seen above, the default configuration distributed with the image is tied to certain values that may not be suitable for you tests. Specifically: * MySQL: @@ -189,14 +189,14 @@ As seen above, the default configuration distributed with the image is tied to c [Top](#top) -####Editing the docker files +#### Editing the docker files The easiest way is by editing both the `Dockerfile` and/or `agent.conf` file under `docker/cygnus-ngsi` and building the cygnus-ngsi image from the scratch. This gives you total control on the docker image. [Top](#top) -####Environment variables +#### Environment variables Those parameters associated to an environment variable can be easily overwritten in the command line using the `-e` option. For instance, if you want to change the log4j logging level, simply run: $ docker run -e CYGNUS_LOG_LEVEL='DEBUG' cygnus-ngsi @@ -207,7 +207,7 @@ Or if you want to configure non empty MySQL user and password: [Top](#top) -####Using volumes +#### Using volumes Another possibility is to start a container with a volume (`-v` option) and map the entire configuration file within the container with a local version of the file: $ docker run -v /absolute/path/to/local/agent.conf:/opt/apache-flume/conf/agent.conf cygnus-ngsi diff --git a/doc/cygnus-ngsi/installation_and_administration_guide/install_with_rpm.md b/doc/cygnus-ngsi/installation_and_administration_guide/install_with_rpm.md index 5439dc503..03952c9ad 100644 --- a/doc/cygnus-ngsi/installation_and_administration_guide/install_with_rpm.md +++ b/doc/cygnus-ngsi/installation_and_administration_guide/install_with_rpm.md @@ -1,4 +1,4 @@ -#Installing cygnus-ngsi with RPM (CentOS/RedHat) +# Installing cygnus-ngsi with RPM (CentOS/RedHat) Simply configure the FIWARE repository if not yet configured: $ cat > /etc/yum.repos.d/fiware.repo <Introduction +# Introduction This document details how to install and administrate a **cygnus-ngsi** agent. cygnus-ngsi is a connector in charge of persisting [Orion](https://github.com/telefonicaid/fiware-orion) context data in certain configured third-party storages, creating a historical view of such data. In other words, Orion only stores the last value regarding an entity's attribute, and if an older value is required then you will have to persist it in other storage, value by value, using cygnus-ngsi. @@ -19,14 +19,14 @@ Current stable release is able to persist Orion context data in: [Top](#top) -##Intended audience +## Intended audience This document is mainly addressed to those FIWARE users already using an [Orion Context Broker](https://github.com/telefonicaid/fiware-orion) instance and willing to create historical views from the context data managed by Orion. In that case, you will need this document in order to learn how to install and administrate cygnus-ngsi. If your aim is to create a new sink for cygnus-ngsi, or expand it in some way, please refer to the [User and Programmer Guide](../user_and_programmer_guide/introduction.md). [Top](#top) -##Structure of the document +## Structure of the document Apart from this introduction, this Installation and Administration Guide mainly contains sections about installing, configuring, running and testing cygnus-ngsi. The FIWARE user will also find useful information regarding multitenancy or performance tips. In addition, sanity check procedures (useful to know wether the installation was successful or not) and diagnosis procedures (a set of tips aiming to help when an issue arises) are provided as well. It is very important to note that, for those topics not covered by this documentation, the related section in cygnus-common applies. Specifically: diff --git a/doc/cygnus-ngsi/installation_and_administration_guide/ipv6_support.md b/doc/cygnus-ngsi/installation_and_administration_guide/ipv6_support.md index c280d7dcb..07901f6b4 100644 --- a/doc/cygnus-ngsi/installation_and_administration_guide/ipv6_support.md +++ b/doc/cygnus-ngsi/installation_and_administration_guide/ipv6_support.md @@ -1,11 +1,11 @@ -#IPv6 support +# IPv6 support Content: * [Service endpoint](#section1) * [API](#section2) * [GUI](#section3) -##Service endpoint +## Service endpoint Native Flume Http sources supprt IPv6, therefore cygnus-ngsi supports IPv6 at its service endpoint. It is just a matter of configuring the Http source `bind` parameter (which by default takes the value `127.0.0.1` when not explicitely configured) to `::` (undefined address) or `::1` (IPv6 localhost). @@ -26,12 +26,12 @@ cygnus-ngsi.sources.http-source.port = 5050 [Top](#top) -##API +## API Currently, the host part of the API binding is harcoded to the IPv4 undefined address, i.e. `0.0.0.0`. Thus, IPv6 cannot be enabled. [Top](#top) -##GUI +## GUI Currently, the host part of the GUI binding is harcoded to the IPv4 undefined address, i.e. `0.0.0.0`. Thus, IPv6 cannot be enabled. -[Top](#top) \ No newline at end of file +[Top](#top) diff --git a/doc/cygnus-ngsi/installation_and_administration_guide/issues_and_contact.md b/doc/cygnus-ngsi/installation_and_administration_guide/issues_and_contact.md index 8f8609b0f..4b24041f4 100644 --- a/doc/cygnus-ngsi/installation_and_administration_guide/issues_and_contact.md +++ b/doc/cygnus-ngsi/installation_and_administration_guide/issues_and_contact.md @@ -1,4 +1,4 @@ -#Reporting issues and contact information +# Reporting issues and contact information There are several channels suited for reporting issues and asking for doubts in general. Each one depends on the nature of the question: * Use [stackoverflow.com](http://stackoverflow.com) for specific questions about this software. Typically, these will be related to installation problems, errors and bugs. Development questions when forking the code are welcome as well. Use the `fiware-cygnus` tag. diff --git a/doc/cygnus-ngsi/installation_and_administration_guide/multitenancy.md b/doc/cygnus-ngsi/installation_and_administration_guide/multitenancy.md index ae6f8a78e..cb30668df 100644 --- a/doc/cygnus-ngsi/installation_and_administration_guide/multitenancy.md +++ b/doc/cygnus-ngsi/installation_and_administration_guide/multitenancy.md @@ -1,4 +1,4 @@ -#Multitenancy in Cygnus +# Multitenancy in Cygnus Content: * [General multitenancy assumption](#section1) @@ -7,7 +7,7 @@ Content: * [`NGSIKafkaSink`](#section2.2) * [`NGSITestSink`](#section2.3) -##General multitenancy assumption +## General multitenancy assumption This section explains the general rule to be followed when implementing multitenancy in Cygnus, specifically when using the following sinks: * `NGSIHDFSSink` @@ -29,20 +29,20 @@ The above mentioned solution can only work if the final storage holding the hist [Top](#top) -##Exceptions -###`NGSIDynamoDBSink` +## Exceptions +### `NGSIDynamoDBSink` DynamoDB handles a single database per client, having each client different access credentials. This means a single superuser cannot be configured in charge of writing data on behalf of several tenants. Even in the case a single DynamoDB user space is used for all the tenants and a table is created in a per client basis, it is not a valid a solution since having access to the user space means access to all the tables. [This](https://github.com/telefonicaid/fiware-cygnus/issues/608) opened issue tries to enclose a valid solution than will be implemented sometime in the future. [Top](#top) -###`NGSIKafkaSink` +### `NGSIKafkaSink` For the time being, Kafka does not support any kind of authorization mechanism. This implies, despite several topics can be created, one per tenant, there is no control on who can read the different topics. [Top](#top) -###`NGSITestSink` +### `NGSITestSink` This is a testing purpose sink and thus there is no need for multitenancy support. [Top](#top) diff --git a/doc/cygnus-ngsi/installation_and_administration_guide/name_mappings.md b/doc/cygnus-ngsi/installation_and_administration_guide/name_mappings.md index 0c011d062..b6ecea059 100644 --- a/doc/cygnus-ngsi/installation_and_administration_guide/name_mappings.md +++ b/doc/cygnus-ngsi/installation_and_administration_guide/name_mappings.md @@ -1,4 +1,4 @@ -#Name Mappings +# Name Mappings Name Mappings is an advanced global feature of Cygnus. It is global because it is available for all NGSI sinks. Name Mappings allow changing the notified FIWARE service, FIWARE service path, entity IDs, entity types, attribute names and attribute types, given a mapping. Such a mapping is just a Json within a configuration file detailing how original naming must be replaced by alternative naming. @@ -51,7 +51,7 @@ Please observe no raw bytes about the body are sent. Whenever a sink takes one of these `NGSIEvent`'s, it is only a matter of deciding if such a sink enables the mappings (`enable_name_mappings` parameter) or not. If mappings are enabled, then the already parsed `NotifyContextRequest`, mapped version, is used. If not, then the original version is used. -##Creating your own Name Mappings +## Creating your own Name Mappings Please observe the mappings definition is global to all the sinks, at `NGSIRestHandler`, as a Flume interceptor. Nevertheless, the application is local to the sink, depending on the `enable_name_mappings` parameter. Thus, if none of your sinks is going to take advantage of the mappings, simply avoid configuring the `NGSINameMappingsInterceptor` in `NGSIRestHandler`. That will avoid unnecessary interception and iterations on the mappings and Cygnus will perform faster. ``` @@ -174,7 +174,7 @@ $ cat /path/to/conf/name_mappings.conf [Top](#top) -##Name Mappings vs. grouping rules +## Name Mappings vs. grouping rules As seen, the Name Mappings feature is quite similar to the already existent grouping rules. Both of them are Flume interceptors and both of them allow changing certain notified name elements. Thus, which are the differences? Mainly: | Name Mappings | Grouping rules | @@ -187,7 +187,7 @@ As seen, the Name Mappings feature is quite similar to the already existent grou [Top](#top) -##Further reading +## Further reading Please, check the [specific documentation](../flume_extensions_catalogue/ngsi_name_mappings_interceptor.md) for this custom interceptor in the Flume extensions catalogue for cygnus-ngsi agent. [Top](#top) diff --git a/doc/cygnus-ngsi/installation_and_administration_guide/ngsi_agent_conf.md b/doc/cygnus-ngsi/installation_and_administration_guide/ngsi_agent_conf.md index 1555dec99..4369b33ac 100644 --- a/doc/cygnus-ngsi/installation_and_administration_guide/ngsi_agent_conf.md +++ b/doc/cygnus-ngsi/installation_and_administration_guide/ngsi_agent_conf.md @@ -1,4 +1,4 @@ -#cygnus-ngsi agent configuration +# cygnus-ngsi agent configuration cygnus-ngsi, as any other Cygnus agent, follows the multi-instance configuration of cygnus-common. The file `agent_.conf` can be instantiated from a template given in the Cygnus repository, `conf/agent_ngsi.conf.template`. diff --git a/doc/cygnus-ngsi/installation_and_administration_guide/ngsiv2_support.md b/doc/cygnus-ngsi/installation_and_administration_guide/ngsiv2_support.md index 37dd041df..647d9ebb4 100644 --- a/doc/cygnus-ngsi/installation_and_administration_guide/ngsiv2_support.md +++ b/doc/cygnus-ngsi/installation_and_administration_guide/ngsiv2_support.md @@ -1,4 +1,4 @@ -#NGSIv2 support +# NGSIv2 support Currently, cygnus-ngsi does not support [NGSIv2](http://telefonicaid.github.io/fiware-orion/api/v2/stable/). Only notification format within NGSIv1 is accepted by `NGSIRestHandler`, i.e. something like: ``` diff --git a/doc/cygnus-ngsi/installation_and_administration_guide/performance_tips.md b/doc/cygnus-ngsi/installation_and_administration_guide/performance_tips.md index 60c62b48b..d5f3eca3a 100644 --- a/doc/cygnus-ngsi/installation_and_administration_guide/performance_tips.md +++ b/doc/cygnus-ngsi/installation_and_administration_guide/performance_tips.md @@ -1,4 +1,4 @@ -#Tuning tips for increasing the performance +# Tuning tips for increasing the performance Content: * [Batching](#section1) @@ -15,8 +15,8 @@ Content: * [Grouping Rules](#section5) * [Writing logs](#section6) -##Batching -###Sizing +## Batching +### Sizing Batching is the mechanism Cygnus implements for processing sets of `NGSIEvent`s (a `NGSIEvent` typically comes from a Orion's notification) all together instead of one by one. These sets, or properly said batches, are built by `NGSISink`, the base class all the sinks extend. Thus, having the batches already created in the inherited code the sinks only have to deal with the persistence of the data within them. Typically, the information within a whole batch is aggregated into a large data chunk that is stored at the same time by using a single write/insert/upsert operation. Why? What is important regarding the batch mechanism is it largely increases the performance of the sink because the number of writes is dramatically reduced. Let's see an example. Let's assume 100 notifications, no batching mechanism at all and a HDFS storage. It seems obvious 100 writes are needed, one per `NGSIEvent`/notification. And writing to disk is largely slow. Now let's assume a batch of size 100. In the best case, all these `NGSIEvent`s/notifications regard to the same entity, which means all the data within them will be persisted in the same HDFS file and therefore only one write is required. @@ -37,7 +37,7 @@ Nevertheless, as explained above, it is highly recommended to increase at least [Top](#top) -###Retries +### Retries Batches may not be persisted. This is something may occur from time to time because the sink is temporarily not available, or the communications are failing. In that case, Cygnus has implemented a retry mechanism. Regarding the retries of not persisted batches, a couple of parameters is used. On the one hand, a Time-To-Live (TTL) is used, specifing the number of retries Cygnus will do before definitely dropping the event (0 means no retries, -1 means infinite retries). On the other hand, a list of retry intervals can be configured. Such a list defines the first retry interval, then the second retry interval, and so on; if the TTL is greater than the length of the list, then the last retry interval is repeated as many times as necessary. @@ -53,7 +53,7 @@ On the other hand, very short retry intervals will make Cygnus working unncessar [Top](#top) -##Sink parallelization +## Sink parallelization Most of the processing effort done by Cygnus is located at the sinks, and these elements can be a bottleneck if not configured appropriately. Basic Cygnus configuration is about a source writing Flume events into a single channel where a single sink consumes those events: @@ -77,7 +77,7 @@ This can be clearly moved to a multiple sink configuration running in parallel. [Top](#top) -###Multiple sinks, single channel +### Multiple sinks, single channel You can simply add more sinks consuming events from the same single channel. This configuration theoretically increases the processing capabilities in the sink side, but usually shows an important drawback, specially if the events are consumed by the sinks very fast: the sinks have to compete for the single channel. Thus, some times you can find that adding more sinks in this way simply turns the system slower than a single sink configuration. This configuration is only recommended when the sinks require a lot of time to process a single event, ensuring few collisions when accessing the channel. cygnus-ngsi.sources = mysource @@ -107,7 +107,7 @@ You can simply add more sinks consuming events from the same single channel. Thi [Top](#top) -###Multiple sinks, multiple channels +### Multiple sinks, multiple channels The above mentioned drawback can be solved by configuring a channel per each sink, avoiding the competition for the single channel. However, when multiple channels are used for a same storage, then some kind of dispatcher deciding which channels will receive a copy of the events is required. This is the goal of the Flume Channel Selectors, a piece of software selecting the appropriate set of channels the Flume events will be put in. The default one is [`Replicating Channel Selector`](http://flume.apache.org/FlumeUserGuide.html#replicating-channel-selector-default), i.e. each time a Flume event is generated at the sources, it is replicated in all the channels connected to those sources. There is another selector, the [`Multiplexing Channel Selector`](http://flume.apache.org/FlumeUserGuide.html#multiplexing-channel-selector), which puts the events in a channel given certain matching-like criteria. Nevertheless: @@ -161,20 +161,20 @@ Basically, the custom Channel Selector type must be configured, together [Top](#top) -###Why the `LoadBalancingSinkProcessor` is not suitable +### Why the `LoadBalancingSinkProcessor` is not suitable [This](http://flume.apache.org/FlumeUserGuide.html#load-balancing-sink-processor) Flume Sink Processor is not suitable for our parallelization purposes due to the load balancing is done in a sequential way. I.e. either in a round robin-like configuration of the load balancer either in a random way, the sinks are used one by one and not at the same time. [Top](#top) -##Channel considerations -###Channel type +## Channel considerations +### Channel type The most important thing when designing a channel for Cygnus (in general, a Flume-based application) is the tradeoff between speed and reliability. This applies especially to the channels. On the one hand, the `MemoryChannel` is a very fast channel since it is implemented directly in memory, but it is not reliable at all if, for instance, Cygnus crashes for any reason and it is recovered by a third party system (let's say Monit): in that case the Flume events put into the memory-based channel before the crash are lost. On the other hand, the `FileChannel` and `JDBCChannel` are very reliable since there is a permanent support for the data in terms of OS files or RDBM tables, respectively. Nevertheless, they are slower than a `MemoryChannel` since the I/O is done against the HDD and not against the memory. [Top](#top) -###Channel capacity +### Channel capacity There are no empirical tests showing a decrease of the performance if the channel capacity is configured with a large number, let's say 1 million of Flume events. The `MemoryChannel` is supposed to be designed as a chained FIFO queue, and the persistent channels only manage a list of pointers to the real data, which should not be hard to iterate. Such large capacities are only required when the Flume sources are faster than the Flume sinks (and even in that case, sooner or later, the channels will get full) or a lot of processing retries are expected within the sinks (see next section). @@ -187,7 +187,7 @@ In order to calculate the appropriate capacity, just have in consideration the f [Top](#top) -##Name Mappings +## Name Mappings Name Mappings feature is a powerful tool for changing the original notified FIWARE service, FIWARE service path, entity ID and type, and attributes name and type. As a side effect of this changing, Name Mappings can be used for routing your data, for instance by setting a common alternative FIWARE service path for two or more original service paths, all the data regarding these service paths will be stored under the same CKAN package. As you may suppose, the usage of Name Mappings slows down Cygnus because the alternative settings are obtained after checking a list of mappings written in Json format. Despite such a Json is loaded into memory and regular expressions are compiled into patterns, it must be iterated each time a `NGSIEvent`/notification is sent to Cygnus and the conditions for matching checked. @@ -199,7 +199,7 @@ Nevertheless, you may write your Name Mappings in a smart way: [Top](#top) -##Grouping Rules +## Grouping Rules **IMPORTANT NOTE: from release 1.6.0, this feature is deprecated in favour of Name Mappings. More details can be found [here](./deprecated_and_removed.md#section2.1).** Grouping Rules feature is a powerful tool for routing your data, i.e. setting an alternative FIWARE service path and entity, whcih in the end decides the HDFS file, MySQL/PostgreSQL/DynamoDB/Carto table, CKAN resource, Kafka queue or MongoDB collection for your context data; on the contrary, the default destination is used. @@ -213,7 +213,7 @@ Nevertheless, you may write your Grouping Rules in a smart way: [Top](#top) -##Writing logs +## Writing logs Writing logs, as any I/O operation where disk writes are involved, is largely slow. Please avoid writing a huge number if logs unless necessary, i.e. because your are debugging Cygnus, and try running cygnus at least with `INFO` level (despite a lot of logs are still written at that level). The best is running with `ERROR` level. Logs are totally disabled by using the `OFF` level. Logging level Cygnus run with is configured in `/usr/cygnus/conf/log4j.properties`. `INFO` is configured by default: diff --git a/doc/cygnus-ngsi/installation_and_administration_guide/running_as_process.md b/doc/cygnus-ngsi/installation_and_administration_guide/running_as_process.md index b5b8243e6..a95466f25 100644 --- a/doc/cygnus-ngsi/installation_and_administration_guide/running_as_process.md +++ b/doc/cygnus-ngsi/installation_and_administration_guide/running_as_process.md @@ -1,4 +1,4 @@ -#Running cygnus-ngsi as a process +# Running cygnus-ngsi as a process Cygnus implements its own startup script, `cygnus-flume-ng` which replaces the standard `flume-ng` one, which in the end runs a custom `com.telefonica.iot.cygnus.nodes.CygnusApplication` instead of a standard `org.apache.flume.node.Application`. In foreground (with logging): diff --git a/doc/cygnus-ngsi/installation_and_administration_guide/running_as_service.md b/doc/cygnus-ngsi/installation_and_administration_guide/running_as_service.md index 0f59044a2..c75a2a6df 100644 --- a/doc/cygnus-ngsi/installation_and_administration_guide/running_as_service.md +++ b/doc/cygnus-ngsi/installation_and_administration_guide/running_as_service.md @@ -1,4 +1,4 @@ -#Running cygnus-ngsi as a service +# Running cygnus-ngsi as a service **Note**: Cygnus can only be run as a service if you installed it through the RPM. Once the `cygnus_instance_.conf` and `agent_.conf` files are properly configured, just use the `service` command to start, restart, stop or get the status (as a sudoer): diff --git a/doc/cygnus-ngsi/installation_and_administration_guide/sanity_checks.md b/doc/cygnus-ngsi/installation_and_administration_guide/sanity_checks.md index a0d38fcbd..f7e5e42d1 100644 --- a/doc/cygnus-ngsi/installation_and_administration_guide/sanity_checks.md +++ b/doc/cygnus-ngsi/installation_and_administration_guide/sanity_checks.md @@ -1,2 +1,2 @@ -#Sanity checks +# Sanity checks Please, check the [cygnus-common](../../cygnus-common/installation_and_administration_guide/sanity_checks.md) section. diff --git a/doc/cygnus-ngsi/installation_and_administration_guide/testing.md b/doc/cygnus-ngsi/installation_and_administration_guide/testing.md index 7205c888b..50b303d04 100644 --- a/doc/cygnus-ngsi/installation_and_administration_guide/testing.md +++ b/doc/cygnus-ngsi/installation_and_administration_guide/testing.md @@ -1,10 +1,10 @@ -#Testing +# Testing Content: * [Unit testing](#section1) * [e2e testing](#section2) -##Unit testing +## Unit testing Running the tests require [Apache Maven](https://maven.apache.org/) installed and Cygnus sources downloaded. $ git clone https://github.com/telefonicaid/fiware-cygnus.git @@ -300,7 +300,7 @@ Tests run: 85, Failures: 0, Errors: 0, Skipped: 0 [Top](#top) -##e2e testing +## e2e testing Cygnus can be tested form a e2e point of view by using any of the scripts, [given with this repo](../../../cygnus-ngsi/resources/ngsi-examples), emulating a NGSI-like notification. You can find both Json and XML examples of simple and compound notifications, with or without metadata, even model entities and loops of continuous notifiers. For instance, if running the `notification-json-simple.sh`: diff --git a/doc/cygnus-ngsi/integration/orion_cygnus_kafka.md b/doc/cygnus-ngsi/integration/orion_cygnus_kafka.md index d79854333..2c0dedf87 100644 --- a/doc/cygnus-ngsi/integration/orion_cygnus_kafka.md +++ b/doc/cygnus-ngsi/integration/orion_cygnus_kafka.md @@ -1,4 +1,4 @@ -#Persisting information from Orion to Kafka, using Cygnus +# Persisting information from Orion to Kafka, using Cygnus Content: * [Introduction](#section1) @@ -11,13 +11,13 @@ Content: * [Appending entities](#section6) * [Updating entities](#section7) -##Introduction +## Introduction Step-by-step guide for storing NGSI-like context data in Kafka topics using Cygnus. This process has some components that have to be explained detailedly. All the components are running in a local machine, using localhost and differents ports for every component. [Top](#top) -##Running Orion +## Running Orion (Orion Context Broker)[https://github.com/telefonicaid/fiware-orion] must be installed following [this guide](https://github.com/telefonicaid/fiware-orion/blob/develop/doc/manuals/admin/install.md). Orion allows us to manage all the whole lifecycle of context information including updates, queries, registrations and subscriptions. In this case we are going to do subscriptions to context information and update this information with new values. @@ -44,7 +44,7 @@ $ curl -X GET http://localhost:1026/version [Top](#top) -##Running Kafka +## Running Kafka [Kafka](http://kafka.apache.org/) is a distributed, partitioned, and replicated commit log service. The information is stored by topics, published by producers in brokers and consumed by consumers. Our case needs to run a [Zookeeper](https://zookeeper.apache.org/), necessary for manage the consumer and producer actions and the functionality of the brokers connected. Instead, we need to configure the `brokers` in order to store the information properly. @@ -56,7 +56,7 @@ This section is divided in two components: Zookeepers and Brokers. Every part ne [Top](#top) -###Zookeeper +### Zookeeper [Zookeeper](https://zookeeper.apache.org/) is a part of Kafka that must be started before Brokers. In this case we need to adjust some configuration files: consumer, producer and Zookeeper. @@ -222,7 +222,7 @@ At this moment we have a local Orion ContextBroker running on por 1026 and a loc [Top](#top) -###Brokers +### Brokers Brokers, known as 'servers' too, need to be configured with some values. This concrete case show how to configure 1 broker, enough for this task. They are managed by a configuration file which name is `serverx.properties`, being x the number of the broker. We are going to configure `server1.properties`. ``` @@ -423,7 +423,7 @@ At this moment we have a local Orion ContextBroker running on port 1026, a local [Top](#top) -##Running Cygnus +## Running Cygnus Cygnus is the connector in charge of persisting Orion context data in Kafka, creating a historical view of such data. Cygnus runs once we have configured the file `agent.conf`, that contains all the values necessary. We are going to use the agent below: ``` @@ -473,7 +473,7 @@ At this moment we have a local Orion ContextBroker running on por 1026, a local [Top](#top) -##Creating a subscription +## Creating a subscription After all the previous requisites we can do a subscription. Now we have to define the behaviour of our subscription, defining an entity, a type for that entity, a Fiware-Service and a Fiware-ServicePath that will be part of the request. For this example we use: * Entity: Book1 @@ -543,7 +543,7 @@ orion-library 0.031GB [Top](#top) -##Appending entities +## Appending entities The first action that we need is to append an entity. As we said previously we are going to append an entity `Book1` of type `Book`. Sending a request to our local Orion with the `APPEND` option we store the entity for future notifications: ``` @@ -652,7 +652,7 @@ time=2016-08-30T08:03:16.065CEST | lvl=INFO | corr=31ce961a-2767-4acd-bd5e-b623c [Top](#top) -##Updating entities +## Updating entities Once appended the entity, we are going to update the information. This request is the same as `APPEND`, the only different is that, in this case, we have to send the `UPDATE` option. ``` diff --git a/doc/cygnus-ngsi/integration/orion_cygnus_spark.md b/doc/cygnus-ngsi/integration/orion_cygnus_spark.md index 61ed5eb66..72af791b5 100644 --- a/doc/cygnus-ngsi/integration/orion_cygnus_spark.md +++ b/doc/cygnus-ngsi/integration/orion_cygnus_spark.md @@ -1,4 +1,4 @@ -#Connecting Orion Context Broker with Spark Streaming, using Cygnus +# Connecting Orion Context Broker with Spark Streaming, using Cygnus Content: * [Introduction](#section1) @@ -8,7 +8,7 @@ Content: * [Testing everything together](#section5) * [Further information](#section6) -##Introduction +## Introduction The idea behind this document is to explain, step-by-step, how to send on change context information in a real time basis from Orion Context Broker to Spark Streaming, for its analysis. Such a connection will be made thanks to Cygnus, specifically by using the following components: @@ -26,7 +26,7 @@ All components installation, configuration and running will be explained accordi [Top](#top) -##Setting up Orion Context Broker +## Setting up Orion Context Broker Orion Context Broker installation guide can be found [here](https://fiware-orion.readthedocs.io/en/master/admin/install/index.html). We recommend to use release 1.6.0 at least. The default configuarion is OK for this integration example. Simply run it by typing in a shell: @@ -91,7 +91,7 @@ Please refere to Orion Context Broker [official documentation](http://fiware-ori [Top](#top) -##Setting up Cygnus +## Setting up Cygnus Regarding installation, do it from [FIWARE yum repository](https://github.com/telefonicaid/fiware-cygnus/blob/master/doc/cygnus-ngsi/installation_and_administration_guide/install_with_rpm.md). Once installed in its latest version, a Cygnus agent must be configured as follows: ``` @@ -151,7 +151,7 @@ At this moment of the integration most probably you'll see several Java exceptio [Top](#top) -##Setting up spark Streaming +## Setting up spark Streaming Apache Spark will be installed from the URL resulting from entering at [Spark Download page](http://spark.apache.org/downloads.html) and selecting: * Spark release: 1.6.3 @@ -190,7 +190,7 @@ $ cd spark-1.6.3-bin-hadoop2.3 [Top](#top) -##Testing everything together +## Testing everything together In order to test the whole architecture of our real time context information analysis we'll use an already developed analysis application Spark provides within its examples, called `JavaFlumeEventCount`: ``` @@ -222,9 +222,9 @@ Received 1 flume events. [Top](#top) -##Further information +## Further information From here on, the user should create his/her own analytics for Spark Streaming, based on the binary Avro events containing NGSI context information. It is highly recommended using the NGSI-specific functions for Spark bundled as a library in the [FIWARE Cosmos](https://github.com/telefonicaid/fiware-cosmos/tree/master/cosmos-spark-library) repository. Such a library contains primitives for deserializing binary Avro events, using tuples adapted to the context entity model, and so on. And will help for sure any FIWARE developer, simplifying the design of NGSI-like analytics on top of Spark. -[Top](#top) \ No newline at end of file +[Top](#top) diff --git a/doc/cygnus-ngsi/quick_start_guide.md b/doc/cygnus-ngsi/quick_start_guide.md index 295494588..5f6ec2b10 100644 --- a/doc/cygnus-ngsi/quick_start_guide.md +++ b/doc/cygnus-ngsi/quick_start_guide.md @@ -1,7 +1,7 @@ -#Cygnus Quick Start Guide +# Cygnus Quick Start Guide This quick start overviews the steps a newbie programmer will have to follow in order to get familiar with Cygnus and its basic functionality. For more detailed information, please refer to the [README](https://github.com/telefonicaid/fiware-cygnus/blob/master/README.md); the [Installation and Administration Guide](./installation_and_administration_guide/introduction.md), the [User and Programmer Guide](user_and_programmer_guide/introduction.md) and the [Flume Extensions Catalogue](flume_extensions_catalogue/introduction.md) fully document Cygnus. -##Installing Cygnus +## Installing Cygnus Open a terminal and simply configure the FIWARE repository if not yet configured and use your applications manager in order to install the latest version of Cygnus (CentOS/RedHat example): ``` @@ -31,7 +31,7 @@ $ export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64 In order to do it permanently, edit `/root/.bash_profile` (root user) or `/etc/profile` (other users). -##Configuring a test agent +## Configuring a test agent This kind of agent is the simplest one you can configure with Cygnus. It is based on a standard `HTTPSource`, a `MemoryChannel` and a `NGSITestSink`. Don't worry about the configuration details, specially those about the source; simply think on a Http listener waiting for Orion notifications on port TCP/5050 and sending that notifications in the form of Flume events to a testing purpose sink that will not really persist anything in a third-party storage, but will log the notified context data. @@ -125,7 +125,7 @@ time=2015-12-10T14:31:49.486CET | lvl=INFO | trans=1449754282-690-0000000000 | s ``` -##Reporting issues and contact information +## Reporting issues and contact information There are several channels suited for reporting issues and asking for doubts in general. Each one depends on the nature of the question: * Use [stackoverflow.com](http://stackoverflow.com) for specific questions about this software. Typically, these will be related to installation problems, errors and bugs. Development questions when forking the code are welcome as well. Use the `fiware-cygnus` tag. diff --git a/doc/cygnus-ngsi/user_and_programmer_guide/README.md b/doc/cygnus-ngsi/user_and_programmer_guide/README.md index 02905d1c3..09610fb14 100644 --- a/doc/cygnus-ngsi/user_and_programmer_guide/README.md +++ b/doc/cygnus-ngsi/user_and_programmer_guide/README.md @@ -1,8 +1,8 @@ -#User and Programmer Guide +# User and Programmer Guide * [Introduction](./introduction.md) * User guide: * [Connecting Orion Context Broker and Cygnus](./connecting_orion.md) * Programmer guide: * [Adding a new sink](./adding_new_sink.md) -* [Reporting issues and contact information](./issues_and_contact.md) \ No newline at end of file +* [Reporting issues and contact information](./issues_and_contact.md) diff --git a/doc/cygnus-ngsi/user_and_programmer_guide/adding_new_sink.md b/doc/cygnus-ngsi/user_and_programmer_guide/adding_new_sink.md index f7c423b29..64d635e06 100644 --- a/doc/cygnus-ngsi/user_and_programmer_guide/adding_new_sink.md +++ b/doc/cygnus-ngsi/user_and_programmer_guide/adding_new_sink.md @@ -1,4 +1,4 @@ -#Adding new NGSI sinks development guide +# Adding new NGSI sinks development guide Content: * [Introduction](#section1) @@ -14,14 +14,14 @@ Content: * [Backend convenience classes](#section4) * [Naming and placing the new sink](#section5) -##Introduction +## Introduction `cygnus-ngsi` allows for NGSI context data persistence in certain storages by means of Flume sinks. As long as the current collection of sinks could be limited for your purposes, you can add your own sinks regarding a persistence technology of your choice and become an official `cygnus-ngsi` contributor! This document tries to guide you on the development of such alternative sinks, by giving you guidelines about how to write the sink code, but also how the different classes must be called, the backends that can be used, etc. [Top](#top) -##Base `NGSISink` class +## Base `NGSISink` class `NGSISink` is the base class all the sinks within `cygnus-ngsi` extend. It is an abstract class which extends from `CygnusSink` class at `cygnus-common` (which, by its side, extends Flume's native `AbstractSink`). `NGSISink` provides most of the logic required by any NGSI-like sink: @@ -37,7 +37,7 @@ You find this class at the following path: [Top](#top) -###Inherited configuration +### Inherited configuration All the sinks extending `NGSISink` inherit the following configuration parameters: | Parameter | Mandatory | Default value | Comments | @@ -53,12 +53,12 @@ These parameters are read (and defaulted, when required) in the `configure(Conte [Top](#top) -###Inherited starting and stoping +### Inherited starting and stoping TBD [Top](#top) -###Inherited events consumption +### Inherited events consumption The most important part of `NGSISink` is where the events are consumed in a batch-like approach. This is done in the `process()` method inherited from `AbstractSink`, which is overwritten. Such events processing is done by opening a Flume transaction and reading events as specified in the `batch_size` parameter (if no enough events are available, the accumulation ends when the `batch_timeout` is reached). For each event read, the transaction is committed. Once the accumulations ends the transaction is closed. @@ -75,7 +75,7 @@ Specific persistence logic is implemented by overwriting the only abstract metho [Top](#top) -###Inherited counters +### Inherited counters Because `NGSISink` extends `CygnusSink` the following counters are already available for retrieving statistics of any sink extending `NGSISink`: * Number of processed events, i.e. the number of events taken from the channel and accumulated in a batch for persistence. @@ -83,18 +83,18 @@ Because `NGSISink` extends `CygnusSink` the following counters are already avail [Top](#top) -##New sink class -###Specific configuration +## New sink class +### Specific configuration The `configure(Context)` method of `NGSISink` can be extended with specific configuration parameters reading (and defaulting, when required). [Top](#top) -###Kind of information to be persisted +### Kind of information to be persisted We include a list of fields that are usually persisted in Cygnus sinks: * The reception time of the notification in miliseconds. * The reception time of the notification in human-readable format. * The notified/grouped FIWARE service path. * The entity ID. * The entity type. * The attributes and the attribute’s metadata. Regarding the attributes and their metadata, you may choose between two options (or both of them, by means of a switching configuration parameter): * row format, i.e. a write/insertion/upsert per attribute and metadata. * column format, i.e. a single write/insertion/upsert containing all the attributes and their metadata. [Top](#top) -###Fitting to the specific data structures +### Fitting to the specific data structures It is worth to briefly comment how the specific data structures should be created. Typically, the notified service (which defines a client/tenant) should map to the storage element in charge of defining namespaces per user. For instance, in MySQL, PostgreSQL, MongoDB and STH, the service maps to a specific database where permissions can be defined at user level. While in CKAN, the service maps to an organization. In other cases, the mapping is not so evident, as in HDFS, where the service maps into a folder under `hdfs://user/`. Or it is totally impossible to fit, as is the case of DynamoDB or Kafka, where the service can only be added as part of the persistence element name (table and topic, respectively). Regarding the notified service path, it is usually included as a prefix of the destination name (file, table, resource, collection, topic) where the data is really written. This is the case of all the sinks except HDFS and CKAN. HDFS maps the service path as a subfolder under `hdfs://user/service`, and CKAN maps the service path as a package. @@ -103,12 +103,12 @@ It is worth to briefly comment how the specific data structures should be create [Top](#top) -##Backend convenience classes +## Backend convenience classes Sometimes all the necessary logic to persist the notified context data cannot be coded in the `persist` abstract method. In this case, you may want to create a backend class or set of classes wrapping the detailed interactions with the final backend. Nevertheless, these classes should not be located at `cygnus-ngsi` but at `cygnus-common`. [Top](#top) -##Naming and placing the new classes +## Naming and placing the new classes New sink classes must be called `NGSISink`, being technology the name of the persistence backend. Examples are the already existent sinks `NGSIHDFSSink`, `NGSICKANSink` or `NGSIMySQLSink`. Regarding the new sink class location, it must be: diff --git a/doc/cygnus-ngsi/user_and_programmer_guide/connecting_orion.md b/doc/cygnus-ngsi/user_and_programmer_guide/connecting_orion.md index 2866e5a27..181751b84 100644 --- a/doc/cygnus-ngsi/user_and_programmer_guide/connecting_orion.md +++ b/doc/cygnus-ngsi/user_and_programmer_guide/connecting_orion.md @@ -1,4 +1,4 @@ -#Connecting Orion Context Broker and Cygnus +# Connecting Orion Context Broker and Cygnus Cygnus takes advantage of the subscription-notification mechanism of [Orion Context Broker](https://github.com/telefonicaid/fiware-orion). Specifically, Cygnus needs to be notified each time certain entity's attributes change, and in order to do that, Cygnus must subscribe to those entity's attribute changes. You can make a subscription to Orion on behalf of Cygnus by using a `curl` command or any other REST client. In the following example, we assume both Orion and Cygnus run in localhost, and Cygnus is listening in the TCP/5050 port: @@ -32,4 +32,4 @@ You can make a subscription to Orion on behalf of Cygnus by using a `curl` comma EOF ``` -Which means: Each time the the 'car1' entity, of type 'car', which is registered under the service/tenant 'vehicles', subservice '/4wheels', changes its value of 'speed' then send a notification to http://localhost:5050/notify (where Cygnus will be listening) with the 'speed' and 'oil_level' values. This subscription will have a duration of one month, and please, do not send me notifications more than once per second. \ No newline at end of file +Which means: Each time the the 'car1' entity, of type 'car', which is registered under the service/tenant 'vehicles', subservice '/4wheels', changes its value of 'speed' then send a notification to http://localhost:5050/notify (where Cygnus will be listening) with the 'speed' and 'oil_level' values. This subscription will have a duration of one month, and please, do not send me notifications more than once per second. diff --git a/doc/cygnus-ngsi/user_and_programmer_guide/introduction.md b/doc/cygnus-ngsi/user_and_programmer_guide/introduction.md index 4524d7c8f..3d8d77c9e 100644 --- a/doc/cygnus-ngsi/user_and_programmer_guide/introduction.md +++ b/doc/cygnus-ngsi/user_and_programmer_guide/introduction.md @@ -1,12 +1,12 @@ -#User and programmer guide +# User and programmer guide This document describes how to use Cygnus once it has been [installed](../installation_and_administration_guide/introduction.md). It describes how to program extensions for it as well. -##Intended audience +## Intended audience To be done [Top](#top) -##Structure of the document +## Structure of the document To be done -[Top](#top) \ No newline at end of file +[Top](#top) diff --git a/doc/cygnus-ngsi/user_and_programmer_guide/issues_and_contact.md b/doc/cygnus-ngsi/user_and_programmer_guide/issues_and_contact.md index 34655605a..67d14ad27 100644 --- a/doc/cygnus-ngsi/user_and_programmer_guide/issues_and_contact.md +++ b/doc/cygnus-ngsi/user_and_programmer_guide/issues_and_contact.md @@ -1,4 +1,4 @@ -#Reporting issues and contact information +# Reporting issues and contact information There are several channels suited for reporting issues and asking for doubts in general. Each one depends on the nature of the question: * Use [stackoverflow.com](http://stackoverflow.com) for specific questions about this software. Typically, these will be related to installation problems, errors and bugs. Development questions when forking the code are welcome as well. Use the `fiware-cygnus` tag. diff --git a/doc/cygnus-twitter/flume_extensions_catalogue/introduction.md b/doc/cygnus-twitter/flume_extensions_catalogue/introduction.md index 8b6ef74e3..2f236a81d 100644 --- a/doc/cygnus-twitter/flume_extensions_catalogue/introduction.md +++ b/doc/cygnus-twitter/flume_extensions_catalogue/introduction.md @@ -1,14 +1,14 @@ -#Flume extensions catalogue +# Flume extensions catalogue This document details the catalogue of extensions developed for Cygnus on top of [Apache Flume](https://flume.apache.org/). -#Intended audience +# Intended audience The Flume extensions catalogue is a basic piece of documentation for all those FIWARE users using Cygnus. It describes the available extra components added to the Flume technology in order to deal with Twitter-like data. Software developers may also be interested in this catalogue since it may guide the creation of new components (specially, sinks) for Cygnus/Flume. [Top](#top) -#Structure of the document +# Structure of the document This document describes the Twitter Source and Twitter HDFS sink. `TwitterSource` is a source designed to collect data from Twitter. This document contains an explanation about `TwitterSource` configuration and functionality. `TwitterHDFSSink` sink is currently the only one supported by the cygnus-twitter agent. This document contains an explanation about `TwitterHDFSSink` functionality (including how the information within a Flume event is mapped into the storage data structures), configuration, uses cases and implementation details are given. -[Top](#top) \ No newline at end of file +[Top](#top) diff --git a/doc/cygnus-twitter/flume_extensions_catalogue/issues_and_contact.md b/doc/cygnus-twitter/flume_extensions_catalogue/issues_and_contact.md index c13aae383..d1234ddc3 100644 --- a/doc/cygnus-twitter/flume_extensions_catalogue/issues_and_contact.md +++ b/doc/cygnus-twitter/flume_extensions_catalogue/issues_and_contact.md @@ -1,4 +1,4 @@ -#Reporting issues and contact information +# Reporting issues and contact information There are several channels suited for reporting issues and asking for doubts in general. Each one depends on the nature of the question: * Use [stackoverflow.com](http://stackoverflow.com) for specific questions about this software. Typically, these will be related to installation problems, errors and bugs. Development questions when forking the code are welcome as well. Use the `fiware-cygnus` tag. diff --git a/doc/cygnus-twitter/flume_extensions_catalogue/twitter_hdfs_sink.md b/doc/cygnus-twitter/flume_extensions_catalogue/twitter_hdfs_sink.md index 6339efd57..cf6a014fc 100644 --- a/doc/cygnus-twitter/flume_extensions_catalogue/twitter_hdfs_sink.md +++ b/doc/cygnus-twitter/flume_extensions_catalogue/twitter_hdfs_sink.md @@ -16,7 +16,7 @@ Content: * [OAuth2 authentication](#section3.3) * [Kerberos authentication](#section3.4) -##Functionality +## Functionality `com.telefonica.iot.cygnus.sinks.TwitterHDFSSink`, or simply `TwitterHDFSSink` is a sink designed to persist tweets data events within a [HDFS](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html) deployment. The data is provided by Twitter. Tweets are always transformed into internal Flume events at twitter agent source. In the end, the information within these Flume events must be mapped into specific HDFS data structures at the Twitter agent sinks. @@ -25,14 +25,14 @@ Next sections will explain this in detail. [Top](#top) -###Mapping Twitter events to flume events +### Mapping Twitter events to flume events Received Twitter events are transformed into Flume events (specifically `TwitterEvent`), independently of the final backend where it is persisted. The body of a flume TwitterEvent is the representation of a tweet in JSON format. Once translated, the data (now, as a Flume event) is put into the internal channels for future consumption (see next section). [Top](#top) -###Mapping Flume events to HDFS data structures +### Mapping Flume events to HDFS data structures [HDFS organizes](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#The_File_System_Namespace) the data in folders containinig big data files. Such organization is exploited by `TwitterHDFSSink` each time a Flume event is going to be persisted. A file named `/user//` is created (if not existing yet), where `` and `` are configuration parameters. @@ -41,14 +41,14 @@ In this file, a JSON line is created per each tweet. The tweet contains all the To avoid confusions and make the HDFS file reliable all `\n` in tweets have been removed (since they do not provide semantic information). This way, the only `\n` characters that appear in the file are those that split tweets into lines. [Top](#top) -###Hive +### Hive Hive is currently not supported in this version of the `TwitterHDFSSink`. [Top](#top) -##Administration guide -###Configuration +## Administration guide +### Configuration `TwitterHDFSSink` is configured through the following parameters: | Parameter | Mandatory | Default value | Comments | @@ -134,9 +134,9 @@ A configuration example could be: -###Important notes +### Important notes -####About the binary backend +#### About the binary backend Current implementation of the HDFS binary backend does not support any authentication mechanism. A desirable authentication method would be OAuth2, since it is the standard in FIWARE, but this is not currenty supported by the remote RPC server the binary backend accesses. @@ -149,7 +149,7 @@ There exists an [issue](https://github.com/telefonicaid/fiware-cosmos/issues/111 [Top](#top) -####About batching +#### About batching As explained in the [programmers guide](#section3), `TwitterHDFSSink` extends `TwitterSink`, which provides a built-in mechanism for collecting events from the internal Flume channel. This mechanism allows exteding classes have only to deal with the persistence details of such a batch of events in the final backend. What is important regarding the batch mechanism is it largely increases the performance of the sink, because the number of writes is dramatically reduced. Let's see an example, let's assume a batch of 100 Flume events. In the best case, all these events regard to the same entity, which means all the data within them will be persisted in the same HDFS file. If processing the events one by one, we would need 100 writes to HDFS; nevertheless, in this example only one write is required. Obviously, not all the events will always regard to the same unique entity, and many entities may be involved within a batch. But that's not a problem, since several sub-batches of events are created within a batch, one sub-batch per final destination HDFS file. In the worst case, the whole 100 entities will be about 100 different entities (100 different HDFS destinations), but that will not be the usual scenario. Thus, assuming a realistic number of 10-15 sub-batches per batch, we are replacing the 100 writes of the event by event approach with only 10-15 writes. @@ -160,8 +160,8 @@ By default, `TwitterHDFSSink` has a configured batch size and batch accumulation [Top](#top) -##Programmers guide -###`TwitterHDFSSink` class +## Programmers guide +### `TwitterHDFSSink` class `TwitterHDFSSink` extends the base `TwitterSink`. The methods that are extended are: void persistBatch(Batch batch) throws Exception; @@ -178,7 +178,7 @@ A complete configuration as the described above is read from the given `Context` [Top](#top) -###`HDFSBackendImpl` class +### `HDFSBackendImpl` class This is a convenience backend class for HDFS that extends the `HttpBackend` abstract class (provides common logic for any Http connection-based backend) and implements the `HDFSBackend` interface (provides the methods that any HDFS backend must implement). Relevant methods are: public void createDir(String dirPath) throws Exception; @@ -201,7 +201,7 @@ Checks if a HDFS file, given its path, exists ot not. [Top](#top) -###OAuth2 authentication +### OAuth2 authentication [OAuth2](http://oauth.net/2/) is the evolution of the OAuth protocol, an open standard for authorization. Using OAuth, client applications can access in a secure way certain server resources on behalf of the resource owner, and the best, without sharing their credentials with the service. This works because of a trusted authorization service in charge of emitting some pieces of security information: the access tokens. Once requested, the access token is attached to the service request in order the server may ask the authorization service for the validity of the user requesting the access (authentication) and the availability of the resource itself for this user (authorization). A detailed architecture of OAuth2 can be found [here](http://forge.fiware.org/plugins/mediawiki/wiki/fiware/index.php/PEP_Proxy_-_Wilma_-_Installation_and_Administration_Guide), but in a nutshell, FIWARE implements the above concept through the Identity Manager GE ([Keyrock](http://catalogue.fiware.org/enablers/identity-management-keyrock) implementation) and the Access Control ([AuthZForce](http://catalogue.fiware.org/enablers/authorization-pdp-authzforce) implementation); the join of this two enablers conform the OAuth2-based authorization service in FIWARE: @@ -220,7 +220,7 @@ As you can see, your FIWARE Lab credentials are required in the payload, in the [Top](#top) -###Kerberos authentication +### Kerberos authentication Hadoop Distributed File System (HDFS) can be remotely managed through a REST API called [WebHDFS](http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html). This API may be used without any kind of security (in this case, it is enough knowing a valid HDFS user name in order to access this user HDFS space), or a Kerberos infrastructure may be used for authenticating the users. [Kerberos](http://web.mit.edu/kerberos/) is an authentication protocol created by MIT, current version is 5. It is based in symmetric key cryptography and a trusted third party, the Kerberos servers themselves. The protocol is as easy as authenticating to the Authentication Server (AS), which forwards the user to the Key Distribution Center (KDC) with a ticket-granting ticket (TGT) that can be used to retrieve the definitive client-to-server ticket. This ticket can then be used for authentication purposes against a service server (in both directions). @@ -237,7 +237,7 @@ Nevertheless, Cygnus needs this process to be automated. Let's see how through t [Top](#top) -####`conf/cygnus.conf` +#### `conf/cygnus.conf` This file can be built from the distributed `conf/cugnus.conf.template`. Edit appropriately this part of the `NGSIHDFSSink` configuration: # Kerberos-based authentication enabling @@ -255,7 +255,7 @@ I.e. start enabling (or not) the Kerberos authentication. Then, configure a user [Top](#top) -####`conf/krb5_login.conf` +#### `conf/krb5_login.conf` Contains the following line, which must not be changed (thus, the distributed file is not a template but the definitive one). @@ -265,7 +265,7 @@ Contains the following line, which must not be changed (thus, the distributed fi [Top](#top) -####`conf/krb5.conf` +#### `conf/krb5.conf` This file can be built from the distributed `conf/krb5.conf.template`. Edit it appropriately, basically by replacing `EXAMPLE.COM` by your Kerberos realm (this is the same than your domain, but uppercase, i.e. the realm for `example.com` is `EXAMPLE.COM`) and by configuring your Kerberos Key Distribution Center (KDC) and your Kerberos admin/authentication server (ask your netowork administrator in order to know them). diff --git a/doc/cygnus-twitter/flume_extensions_catalogue/twitter_source.md b/doc/cygnus-twitter/flume_extensions_catalogue/twitter_source.md index 77cd0c6b0..434cfb9dd 100644 --- a/doc/cygnus-twitter/flume_extensions_catalogue/twitter_source.md +++ b/doc/cygnus-twitter/flume_extensions_catalogue/twitter_source.md @@ -8,7 +8,7 @@ Content: * [Programmers guide](#section3) * [`TwitterSource` class](#section3.1) -##Functionality +## Functionality `com.telefonica.iot.cygnus.sources.TwitterSource`, or simply `TwitterSource` is a source designed to collect data from [Twitter] (https://twitter.com). Tweets are always transformed into internal Flume events at `TwitterSource`. In the end, the information within these Flume events must be mapped into specific data structures at the corresponding sinks. @@ -17,15 +17,15 @@ Next sections will explain this in detail. [Top](#top) -###Mapping Twitter events to flume events +### Mapping Twitter events to flume events Received Twitter events are transformed into Flume events (specifically `TwitterEvent`), independently of the final backend where it is persisted. The body of a flume TwitterEvent is the representation of a tweet in JSON format. Once translated, the data (now, as a Flume event) is put into the internal channels for future consumption (see next section). [Top](#top) -##Administration guide -###Configuration +## Administration guide +### Configuration `TwitterSource` is configured through the following parameters that are defined in the configuration file `agent_.conf`. The name of the source: @@ -84,8 +84,8 @@ cygnus-twitter.sources.twitter-source.accessToken = xxxxxxxx cygnus-twitter.sources.twitter-source.accessTokenSecret = xxxxxxxx ``` -##Programmers guide -###`TwitterSource` class +## Programmers guide +### `TwitterSource` class `TwitterSource` has two main methods that are described in the following paragraphs. `public void configure(Context context)` diff --git a/doc/cygnus-twitter/installation_and_administration_guide/README.md b/doc/cygnus-twitter/installation_and_administration_guide/README.md index 851759ae5..048aba8ce 100644 --- a/doc/cygnus-twitter/installation_and_administration_guide/README.md +++ b/doc/cygnus-twitter/installation_and_administration_guide/README.md @@ -1,4 +1,4 @@ -#Installation and Administration Guide +# Installation and Administration Guide * [Introduction](./introduction.md) * Installation: diff --git a/doc/cygnus-twitter/installation_and_administration_guide/configuration.md b/doc/cygnus-twitter/installation_and_administration_guide/configuration.md index d5259e9c6..166d907d5 100644 --- a/doc/cygnus-twitter/installation_and_administration_guide/configuration.md +++ b/doc/cygnus-twitter/installation_and_administration_guide/configuration.md @@ -1,9 +1,9 @@ -#Introduction +# Introduction This document describes how to use cygnus-twitter once it has been [installed](../installation_and_administration_guide/introduction.md) and how it works. cygnus-twitter is a Cygnus agent (i.e., a flume agent) that has as source tweets and it can have different sinks. Right now, the HDFS sink is already implemented. -##Configuration file +## Configuration file From the point of view of the user, the main differences with respect to the Cygnus-ngsi agent are in the configuration file `agent_.conf`. In this file, the first difference is the source that is a twitter source: `cygnus-twitter.sources = twitter-source` @@ -53,7 +53,7 @@ Once the parameters related to the source are defined, the file continues defini -##Configuration file example +## Configuration file example ```Java #============================================= # To be put in APACHE_FLUME_HOME/conf/cygnus.conf diff --git a/doc/cygnus-twitter/installation_and_administration_guide/install_from_sources.md b/doc/cygnus-twitter/installation_and_administration_guide/install_from_sources.md index e13a9e417..c8e46848d 100644 --- a/doc/cygnus-twitter/installation_and_administration_guide/install_from_sources.md +++ b/doc/cygnus-twitter/installation_and_administration_guide/install_from_sources.md @@ -1,4 +1,4 @@ -#Installing cygnus-twitter from sources +# Installing cygnus-twitter from sources Content: * [Prerequisites](#section1) @@ -8,13 +8,13 @@ Content: * [Known issues](#section2.3) * [Installing dependencies](#section3) -##Prerequisites +## Prerequisites [`cygnus-common`](../../cygnus-common/installation_and_administration_guide/install_from_sources.md) must be installed. This includes Maven, `cygnus` user creation, Apache Flume and `cygnus-flume-ng` script installation. [Top](#top) -##Installing Cygnus -###Cloning `fiware-cygnus` +## Installing Cygnus +### Cloning `fiware-cygnus` Start by cloning the Github repository: $ git clone https://github.com/telefonicaid/fiware-cygnus.git @@ -25,7 +25,7 @@ Start by cloning the Github repository: [Top](#top) -###Installing `cygnus-twitter` +### Installing `cygnus-twitter` `cygnus-twitter` can be built as a fat Java jar file containing all third-party dependencies (**recommended**): $ cd cygnus-twitter @@ -40,14 +40,14 @@ Or as a thin Java jar file: [Top](#top) -###Known issues +### Known issues It may happen while compiling `cygnus-twitter` the Maven JVM has not enough memory. This can be changed as detailed at the [Maven official documentation](https://cwiki.apache.org/confluence/display/MAVEN/OutOfMemoryError): $ export MAVEN_OPTS="-Xmx512m -XX:MaxPermSize=128m" [Top](#top) -##Installing dependencies +## Installing dependencies These are the packages you will need to install under `APACHE_FLUME_HOME/plugins.d/cygnus/libext/` **if you did not included them in the cygnus-common jar**: | Cygnus dependencies | Version | Required by / comments | diff --git a/doc/cygnus-twitter/installation_and_administration_guide/install_with_docker.md b/doc/cygnus-twitter/installation_and_administration_guide/install_with_docker.md index 439691b6d..6def220ca 100644 --- a/doc/cygnus-twitter/installation_and_administration_guide/install_with_docker.md +++ b/doc/cygnus-twitter/installation_and_administration_guide/install_with_docker.md @@ -12,13 +12,13 @@ Content: * [Environment variables](#section3.2.2) * [Using volumes](#section3.2.3) -##Before starting +## Before starting Obviously, you will need docker installed and running in you machine. Please, check [this](https://docs.docker.com/linux/started/) official start guide. [Top](#top) -##Getting an image -###Building from sources +## Getting an image +### Building from sources Start by cloning the `fiware-cygnus` repository: $ git clone https://github.com/telefonicaid/fiware-cygnus.git @@ -42,7 +42,7 @@ centos 6 61bf77ab8841 4 weeks ago [Top](#top) -###Using docker hub image +### Using docker hub image Instead of building an image from the scratch, you may download it from [hub.docker.com](https://hub.docker.com/r/fiware/cygnus-twitter/): $ docker pull fiware/cygnus-twitter @@ -58,8 +58,8 @@ centos 6 61bf77ab8841 4 weeks ago [Top](#top) -##Using the image -###As it is +## Using the image +### As it is The cygnus-twitter image (either built from the scratch, either downloaded from hub.docker.com) allows running a Cygnus agent in charge of receiving tweets from Twitter and persiting them into a HDFS storage. Start a container for this image by typing in a terminal: @@ -115,7 +115,7 @@ CONTAINER ID IMAGE COMMAND CREATED [Top](#top) -###Using a specific configuration +### Using a specific configuration As seen above, the default configuation distributed with the image is tied to certain values that may not be suitable for you tests. Specifically: * It only works for storing streaming tweets in a temporal file (/tmp). @@ -126,21 +126,21 @@ You may need to alter the above values with values of your own. [Top](#top) -####Editing the docker files +#### Editing the docker files The easiest way is by editing both the `Dockerfile` and/or `agent.conf` file under `docker/cygnus-twitter` and building the cygnus-twitter image from the scratch. This gives you total control on the docker image. [Top](#top) -####Environment variables +#### Environment variables Those parameters associated to an environment variable can be easily overwritten in the command line using the `-e` option. For instance, if you want to change the log4j logging level, simply run: $ docker run -e LOG_LEVEL='DEBUG' cygnus-twitter [Top](#top) -####Using volumes +#### Using volumes Another possibility is to start a container with a volume (`-v` option) and map the entire configuraton file within the container with a local version of the file: $ docker run -v /absolute/path/to/local/agent.conf:/opt/apache-flume/conf/agent.conf cygnus-twitter diff --git a/doc/cygnus-twitter/installation_and_administration_guide/introduction.md b/doc/cygnus-twitter/installation_and_administration_guide/introduction.md index da6550ab2..fe6fd2b06 100644 --- a/doc/cygnus-twitter/installation_and_administration_guide/introduction.md +++ b/doc/cygnus-twitter/installation_and_administration_guide/introduction.md @@ -1,4 +1,4 @@ -#Introduction +# Introduction This document details how to install and administrate a **cygnus-twitter** agent. cygnus-twitter is a connector in charge of persisting [Twitter](http://www.twitter.com) statuses data in certain configured third-party storages. @@ -13,14 +13,14 @@ Current stable release is able to persist Twitter context data in: [Top](#top) -##Intended audience +## Intended audience This document is mainly addressed to those FIWARE users that want to collect public information from twitter, based on keywords and/or geolocated information. In that case, you will need this document in order to learn how to install and administrate cygnus-twitter. If your aim is to create a new sink for cygnus-twitter, or expand it in some way, please refer to the [User and Programmer Guide](../user_and_programmer_guide/programmer_guide.md). [Top](#top) -##Structure of the document +## Structure of the document Apart from this introduction, this Installation and Administration Guide mainly contains sections about installing, configuring, running and testing cygnus-twitter. It is very important to note that, for those topics not covered by this documentation, the related section in cygnus-common applies. Specifically: diff --git a/doc/cygnus-twitter/installation_and_administration_guide/issues_and_contact.md b/doc/cygnus-twitter/installation_and_administration_guide/issues_and_contact.md index c13aae383..d1234ddc3 100644 --- a/doc/cygnus-twitter/installation_and_administration_guide/issues_and_contact.md +++ b/doc/cygnus-twitter/installation_and_administration_guide/issues_and_contact.md @@ -1,4 +1,4 @@ -#Reporting issues and contact information +# Reporting issues and contact information There are several channels suited for reporting issues and asking for doubts in general. Each one depends on the nature of the question: * Use [stackoverflow.com](http://stackoverflow.com) for specific questions about this software. Typically, these will be related to installation problems, errors and bugs. Development questions when forking the code are welcome as well. Use the `fiware-cygnus` tag. diff --git a/doc/cygnus-twitter/installation_and_administration_guide/logs_and_alarms.md b/doc/cygnus-twitter/installation_and_administration_guide/logs_and_alarms.md index 48281d9ed..e1794b6e8 100644 --- a/doc/cygnus-twitter/installation_and_administration_guide/logs_and_alarms.md +++ b/doc/cygnus-twitter/installation_and_administration_guide/logs_and_alarms.md @@ -1,11 +1,11 @@ -#Logs and alarms +# Logs and alarms Content: * [Introduction](#section1) * [Log message types](#section2) * [Alarm conditions](#seciton3) -##Introduction +## Introduction This document describes the alarms a platform integrating Cygnus-twitter should raise when an incident happens. Thus, it is addressed to professional operators and such platform administrators. Cygnus messages are explained before the alarm conditions deriving from those messages are described. @@ -21,7 +21,7 @@ For each alarm, the following information is given: [Top](#top) -##Log message types +## Log message types Cygnus logs are categorized under seven message types, each one identified by a tag in the custom message part of the trace. These are the tags: * Fatal error (`FATAL` level). These kind of errors may cause Cygnus to stop, and thus must be repported to the development team through [stackoverflow.com](http://stackoverflow.com/) (please, tag it with fiware). @@ -47,7 +47,7 @@ Debug messages are labeled as Debug, with a logging level of `DEBUG`. Inf [Top](#top) -##Alarm conditions +## Alarm conditions Alarm ID | Severity | Detection strategy | Stop condition | Description | Action ---|---|---|---|---|--- 1 | CRITICAL | A `FATAL` trace is found. | For each configured Cygnus-twitter component (i.e. `TwitterSource` and `TwiterHDFSSink`), the following trace is found: Startup completed. | A problem has happend at Cygnus startup. The `msg` field details the particular problem. | Fix the issue that is precluding Cygnus startup, e.g. if the problem was due to and invalid twitter API key or invalid coordinates for the geoquery, then change such values. diff --git a/doc/cygnus-twitter/installation_and_administration_guide/running.md b/doc/cygnus-twitter/installation_and_administration_guide/running.md index 7f1f03d05..fd0943f7a 100644 --- a/doc/cygnus-twitter/installation_and_administration_guide/running.md +++ b/doc/cygnus-twitter/installation_and_administration_guide/running.md @@ -1,4 +1,4 @@ -#Running a cygnus-twitter agent +# Running a cygnus-twitter agent Once the `agent_.conf` file is properly configured, just use the following command to start: @@ -13,4 +13,4 @@ The parameters used in these commands are: * `-f` (or `--conf-file`). This is the agent configuration (`agent_.conf`) file. Please observe when running in this mode no `cygnus_instance_.conf` file is required. * `-n` (or `--name`). The name of the Cygnus agent to be run. * `-Dflume.root.logger`. Changes the logging level and the logging appender for log4j. -* `-Duser.timezone=UTC`. Changes the timezone in order all the timestamps (logs, data reception times, etc) are UTC. \ No newline at end of file +* `-Duser.timezone=UTC`. Changes the timezone in order all the timestamps (logs, data reception times, etc) are UTC. diff --git a/doc/cygnus-twitter/installation_and_administration_guide/testing.md b/doc/cygnus-twitter/installation_and_administration_guide/testing.md index aab657ea6..efd25aea3 100644 --- a/doc/cygnus-twitter/installation_and_administration_guide/testing.md +++ b/doc/cygnus-twitter/installation_and_administration_guide/testing.md @@ -1,9 +1,9 @@ -#Testing +# Testing Content: * [Unit testing](#section1) -##Unit testing +## Unit testing Running the tests require [Apache Maven](https://maven.apache.org/) installed and Cygnus sources downloaded. $ git clone https://github.com/telefonicaid/fiware-cygnus.git diff --git a/doc/cygnus-twitter/quick_start_guide.md b/doc/cygnus-twitter/quick_start_guide.md index 1b57d8bb5..b11c120ce 100644 --- a/doc/cygnus-twitter/quick_start_guide.md +++ b/doc/cygnus-twitter/quick_start_guide.md @@ -17,7 +17,7 @@ Or as a thin Java jar file: $ cp target/cygnus-.jar APACHE_FLUME_HOME/plugins.d/cygnus/lib ``` -##Configuring a test agent +## Configuring a test agent This kind of agent is the simplest one you can configure with Cygnus-twitter. It is based on a standard `TwitterSource`, a `MemoryChannel` and a `HDFSSink`. Don't worry about the configuration details, specially those about the source; simply think on a Twitter listener waiting for tweet statuses and sending them in the form of Flume events to a testing purpose sink that will persist tweets in a hdfs third-party storage. @@ -129,7 +129,7 @@ cygnus-twitter.channels.hdfs-channel.capacity = 1000 cygnus-twitter.channels.hdfs-channel.transactionCapacity = 100 ``` -##Reporting issues and contact information +## Reporting issues and contact information There are several channels suited for reporting issues and asking for doubts in general. Each one depends on the nature of the question: * Use [stackoverflow.com](http://stackoverflow.com) for specific questions about this software. Typically, these will be related to installation problems, errors and bugs. Development questions when forking the code are welcome as well. Use the `fiware-cygnus` tag. diff --git a/doc/index.md b/doc/index.md index cb1e4e6a9..2c384e572 100644 --- a/doc/index.md +++ b/doc/index.md @@ -1,4 +1,4 @@ -#Cygnus +# Cygnus [![License badge](https://img.shields.io/badge/license-AGPL-blue.svg)](https://opensource.org/licenses/AGPL-3.0) [![Documentation badge](https://readthedocs.org/projects/fiware-cygnus/badge/?version=latest)](http://fiware-cygnus.readthedocs.org/en/latest/?badge=latest) [![Support badge]( https://img.shields.io/badge/support-sof-yellowgreen.svg)](http://stackoverflow.com/questions/tagged/fiware-cygnus) @@ -8,7 +8,7 @@ [![Docker badge](https://img.shields.io/docker/pulls/fiware/cygnus-ngsi.svg)](https://hub.docker.com/r/fiware/cygnus-ngsi/) [![Docker badge](https://img.shields.io/docker/pulls/fiware/cygnus-twitter.svg)](https://hub.docker.com/r/fiware/cygnus-twitter/) -##Welcome +## Welcome This project is part of [FIWARE](http://fiware.org), being part of the [Cosmos](http://catalogue.fiware.org/enablers/bigdata-analysis-cosmos) Ecosystem. Cygnus is a connector in charge of persisting certain sources of data in certain configured third-party storages, creating a historical view of such data. @@ -34,12 +34,12 @@ Current stable release is able to persist the following sources of data in the f **IMPORTANT NOTE**: for the time being, cygnus-ngsi and cygus-twitter agents cannot be installed in the same base path, because of an incompatibility with the required version of the `httpclient` library. Of course, if you are going to use just one of the agents, there is no problem at all. -##Cyngus place in FIWARE architecture +## Cyngus place in FIWARE architecture Cygnus (more specifically, cygnus-ngsi agent) plays the role of a connector between Orion Context Broker (which is a NGSI source of data) and many FIWARE storages such as CKAN, Cosmos Big Data (Hadoop) and STH Comet. Of course, as previously said, you may add MySQL, Kafka, Carto, etc as other non FIWARE storages to the FIWARE architecture. ![FIWARE architecture](../doc/images/fiware_architecture.png) -##Further documentation +## Further documentation The per agent **Quick Start Guide** found at readthedocs.org provides a good documentation summary ([cygnus-ngsi](http://fiware-cygnus.readthedocs.io/en/latest/cygnus-ngsi/quick_start_guide/index.html), [cygnus-twitter](http://fiware-cygnus.readthedocs.io/en/latest/cygnus-twitter/quick_start_guide/index.html)). Nevertheless, both the **Installation and Administration Guide** and the **User and Programmer Guide** for each agent also found at [readthedocs.org](http://fiware-cygnus.readthedocs.io/en/latest/) cover more advanced topics. @@ -53,8 +53,8 @@ Other interesting links are: * [cygnus-ngsi](https://edu.fiware.org/mod/resource/view.php?id=1037) **introductory course** in FIWARE Academy. * The [Contributing Guidelines](../doc/contributing/contributing_guidelines.md) if your aim is to extend Cygnus. -##Licensing +## Licensing Cygnus is licensed under Affero General Public License (GPL) version 3. You can find a [copy of this license in the repository](https://github.com/telefonicaid/fiware-cygnus/blob/master/LICENSE). -##Reporting issues and contact information +## Reporting issues and contact information Any doubt you may have, please refer to the [Cygnus Core Team](https://github.com/telefonicaid/fiware-cygnus/blob/master/reporting_issues_and_contact.md). diff --git a/docker/cygnus-common/README.md b/docker/cygnus-common/README.md index c5673bd9c..50b4641ef 100644 --- a/docker/cygnus-common/README.md +++ b/docker/cygnus-common/README.md @@ -1,3 +1,3 @@ -#Installing cygnus-common with docker +# Installing cygnus-common with docker Please, refer to the [documentation](../../../../doc/cygnus-common/installation_and_administration_guide/install_with_docker.md) if you want to use a docker image for cygnus-common. diff --git a/docker/cygnus-ngsi/README.md b/docker/cygnus-ngsi/README.md index f0fb283dd..b0c954dac 100644 --- a/docker/cygnus-ngsi/README.md +++ b/docker/cygnus-ngsi/README.md @@ -1,3 +1,3 @@ -#Installing cygnus-ngsi with docker +# Installing cygnus-ngsi with docker Please, refer to the [documentation](../../../../doc/cygnus-ngsi/installation_and_administration_guide/install_with_docker.md) if you want to use a docker image for cygnus-ngsi. diff --git a/docker/cygnus-twitter/README.md b/docker/cygnus-twitter/README.md index f814620eb..7b3d5124e 100644 --- a/docker/cygnus-twitter/README.md +++ b/docker/cygnus-twitter/README.md @@ -1,3 +1,3 @@ -#Installing Cygnus with docker +# Installing Cygnus with docker Please, refer to the [documentation](../../../../doc/cygnus-twitter/installation_and_administration_guide/install_with_docker.md) if you want to use a docker image for cygnus-twitter. diff --git a/reporting_issues_and_contact.md b/reporting_issues_and_contact.md index ab9a66de9..47fe99b91 100644 --- a/reporting_issues_and_contact.md +++ b/reporting_issues_and_contact.md @@ -1,4 +1,4 @@ -#Reporting issues +# Reporting issues There are several channels suited for reporting issues and asking for doubts in general. Each one depends on the nature of the question: * Use [stackoverflow.com](http://stackoverflow.com) for specific questions about this software. Typically, these will be related to installation problems, errors and bugs. Development questions when forking the code are welcome as well. Use the `fiware-cygnus` tag. @@ -7,7 +7,7 @@ There are several channels suited for reporting issues and asking for doubts in **NOTE**: Please try to avoid personally emailing the contributors unless they ask for it. In fact, if you send a private email you will probably receive an automatic response enforcing you to use [stackoverflow.com](http://stackoverflow.com/) or [ask.fiware.org](https://ask.fiware.org/questions/). This is because using the mentioned methods will create a public database of knowledge that can be useful for future users; private email is just private and cannot be shared. -#Cygnus Core Team +# Cygnus Core Team * [francisco.romerobueno@telefonica.com](mailto:francisco.romerobueno@telefonica.com) **[Main contributor]** * [pablo.coellovillalba@telefonica.com](mailto:pablo.coellovillalba@telefonica.com) **[Contributor]** * [fermin.galanmarquez@telefonica.com](mailto:fermin.galanmarquez@telefonica.com) **[Contributor]**
ComponentFeatureFrom version
TwitterHDFSSinkFirst implementation1.1.0