Pyleus is a Python 2.6+ framework for developing and launching Apache Storm topologies.
Please visit our wiki.
master | develop |
---|---|
Pyleus is a framework for building Apache Storm topologies in idiomatic Python.
With Pyleus you can:
- define a topology with a simple YAML file
- have dependency management with a
requirements.txt
file - run faster thanks to Pyleus’ MessagePack based serializer
- pass options to your components directly from the YAML file
- use the Kafka spout built into Storm with only a YAML change
From PyPI:
$ pip install pyleus
Note:
You do NOT need to install pyleus on your Storm cluster. That’s cool, isn't it?
$ git clone https://github.com/Yelp/pyleus.git
$ pyleus build pyleus/examples/exclamation_topology/pyleus_topology.yaml
$ pyleus local exclamation_topology.jar
Or, submit to a Storm cluster with:
$ pyleus submit -n NIMBUS_HOST exclamation_topology.jar
The examples directory contains several annotated Pyleus topologies that try to cover as many Pyleus features as possible.
Build a topology:
$ pyleus build /path/to/pyleus_topology.yaml
Run a topology locally:
$ pyleus local /path/to/topology.jar
Submit a topology to a Storm cluster:
$ pyleus submit [-n NIMBUS_HOST] /path/to/topology.jar
List all topologies running on a Storm cluster:
$ pyleus list [-n NIMBUS_HOST]
Kill a topology running on a Storm cluster:
$ pyleus kill [-n NIMBUS_HOST] TOPOLOGY_NAME
Try pyleus -h
for a list of all the available commands or pyleus CMD -h
for any command-specific help.
Please refer to the wiki for a more detailed tutorial.
This is an example of the directory tree of a simple topology:
my_first_topology/
|-- my_first_topology/
| |-- __init__.py
| |-- dummy_bolt.py
| |-- dummy_spout.py
|-- pyleus_topology.yaml
|-- requirements.txt
A simple pyleus_topology.yaml
should look like the following:
name: my_first_topology
topology:
- spout:
name: my-first-spout
module: my_first_topology.dummy_spout
- bolt:
name: my-first-bolt
module: my_first_topology.dummy_bolt
groupings:
- shuffle_grouping: my-first-spout
This defines a topology where a single bolt subscribes to the output stream of a single spout. As simple as it is.
This is the code implementing dummy_spout.py
:
from pyleus.storm import Spout
class DummySpout(Spout):
OUTPUT_FIELDS = ['sentence', 'name']
def next_tuple(self):
self.emit(("This is a sentence.", "spout",))
if __name__ == '__main__':
DummySpout().run()
Let's now look at dummy_bolt.py
:
from pyleus.storm import SimpleBolt
class DummyBolt(SimpleBolt):
OUTPUT_FIELDS = ['sentence']
def process_tuple(self, tup):
sentence, name = tup.values
new_sentence = "{0} says, \"{1}\"".format(name, sentence)
self.emit((new_sentence,), anchors=[tup])
if __name__ == '__main__':
DummyBolt().run()
Run the topology on your local machine for debugging:
pyleus build my_first_topology/pyleus_topology.yaml
pyleus local --debug my_first_topology.jar
When you are done, hit C-C
.
You can set default values for many configuration options by placing a .pyleus.conf
file in your home directory:
[storm]
nimbus_host: 10.11.12.13
jvm_opts: -Djava.io.tmpdir=/home/myuser/tmp
[build]
pypi_index_url: http://pypi.ninjacorp.com/simple/
Pyleus is licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0