Skip to content
Filip Lemic edited this page May 11, 2015 · 2 revisions

Description:

R2DM is implemented as a web service developed in Python 2.7 using the Flask module. The Flask module provides a simple way of creating RESTful web services. We leveraged this to develop multiple functions for supporting raw data storage and management. Raw data is stored in a MongoDB database, an open-source document database and the leading Not only SQL (NoSQL) database written in C++. It can store JavaScript Object Notation (JSON) based messages using Binary JSON (BSON) format of data. Our data messages are defined as Protocol Buffer structures, a way of encoding structured data using an efficient and extensible binary format. The service we developed, together with the MongoDB database, is running on a EC2 instance in AWS. Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides scalability and platform independence of our services. The experimenter is able to communicate with the service through properly defined HTTP requests.

The design goal of extensibility was achieved using Protocol Buffer (version 2) for defining message types and MongoDB for storing those messages. In Protocol Buffers, each data structure that needs to be encoded is encapsulated in the form of a message. The specification of the internal structure of the messages is done in special protocol files that have the .proto extension. The specification is performed using a simple, but powerful, domain specific data specification language that allows easy description of the different message fields, their types, optionality, etc. Using the Protocol Buffer compiler protoc the .proto specification files can be compiled to generate data access classes in number of languages like C++, Java and Python. These classes provide simple accessors for each field (like query() and set query()) as well as methods to serialize/parse the whole structure to/from raw bytes. NoSQL databases employ less constrained consistency models than traditional relational databases. By using a NoSQL or schemaless type of database, i.e. MongoDB, the service enables the storage of any type of defined message, without the need of changing the code and/or the database itself.

RESTful web services enable remote access to the data using simple HTTP requests. Protocol Buffers serialize messages into binary streams which support fast communication between the experimenters and the platform. To sum up, RESTful web service as a part of service for management of the raw data provides remote access to the service. High availability is supported by running the service in the Amazon cloud. Finally, the fast data flow from experimenter to database is achieved by using Protocol Buffer serialization and MongoDB schema-less database for storing the binary data. Using the Protocol Buffer compiler message specification files (.proto) can be compiled to generate data access classes in a number of programming languages like C++, Java and Python. Furthermore, due to the fact that communication with the cloud is done using HTTP requests, it is possible to manage data from different users’ platforms, and also using different programming languages (most of the modern programming languages provide libraries enabling HTTP requests). To conclude, by using Protocol Buffers and a RESTful web service it is possible to manage raw data from different platforms and using different programming languages.

Clone this wiki locally